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INTERFACING  NETWORK  SIMUEATIONS  WITH  EMPIRICAE  DATA 


EXECUTIVE  SUMMARY 

Social  network  analysis  (SNA)  is  the  mathematical  methodology  of  quantifying 
connections  between  individuals  and  groups.  It  has  become  an  important  analytic  tool  for 
analyzing  terrorist  networks,  friendly  command  and  control  structures,  arms  trade,  biologieal 
warfare,  and  the  spread  of  diseases,  among  other  applieations.  This  analysis  provides  a  wealth  of 
information  about  how  individuals  in  a  network  internet  with  each  other.  Much  of  the  power  of 
SNA  is  derived  from  our  ability  to  make  preseriptions  and  predictions  about  network  behavior. 
There  are  advaneed  simulation  paekages  readily  available  to  conduet  this  analysis,  but  it  is 
partieularly  difficult  to  validate  these  simulation  models.  It  is  desirable  to  model  the  actor 
behavior  from  the  simulation  in  a  statistical  context  and  estimate  relevant  parameters  from 
empirical  data.  In  this  way,  simulations  could  be  grounded  in  robust  analysis  of  real  world  data. 

We  have  developed  and  surveyed  a  number  of  statistical  frameworks  including  the  Link 
Probability  Model  (LPM),  the  Exponential  Random  Graph  Model  (ERGM),  and  the  Aetor 
Oriented  Model  (AOM).  Each  of  these  models  has  parameters  that  ean  be  emp ideally  obtained 
from  soeial  network  data  to  advise  aecurate  simulations.  To  faeilitate  our  analysis,  we  created 
statistical  tests  and  empirieal  frameworks  that  eontribute  to  future  researehers’  abilities  to 
eonduct  eomparison  studies. 

Procedure 

This  project  utilized  data  eolleeted  from  the  IkeNet  (McCulloh  et  al,  2008)  and  ELICIT 
(Lospinoso  et  al,  2009)  experiments  eonducted  at  the  United  States  Military  Aeademy,  as  well  as 
many  popular  data  sets  from  the  SNA  literature.  We  construct  a  simple,  baseline  statistical  model 
called  the  LPM  as  well  as  a  robust  statistieal  test  to  determine  how  well  simulated  network  data 
fits  empirically  observed  data.  We  eompare  the  LPM  to  the  ERGM  using  various  data  sets.  We 
then  utilize  an  AOM  specification  to  empirically  estimate  rate  funetions  that  can  be  used  to 
advise  proper  model  speeification  within  multi-agent  simulation,  then  provide  future  direetions 
for  studying  the  interfaee  of  AOM  and  construeturalist-based  simulation  packages  like  Construct. 


Findings 

This  report  finds  that  while  LPMs  perform  better  than  ERGMs  in  many  of  the  data  sets 
we  encountered  aeross  multiple  domains,  the  AOM  has  the  potential  to  outperform  both.  Future 
work  will  be  needed  to  test  the  efficacy  of  AOM  in  providing  robust  estimates  of  behavioral 
parameters  for  use  in  accurate  multi  agent  simulations.  We  also  reinforee  the  literature’s  finding 
that  the  AOM  is  able  to  determine  statistically  significant  sociological  phenomena  within  a 
partieular  dataset,  as  well  as  bridge  the  gap  between  empirically  estimated  parameters  from  a 
soeial  network  model  into  a  workable  simulation  package. 
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Utilization  and  Dissemination  of  Findings 


This  research  is  an  emerging  area  of  Network  Science.  As  such  it  has  been  and  will 

continue  to  be  presented  at  a  variety  of  academic  conferences.  The  following  is  a  list  of 

conference  proceedings  and  publications  which  directly  contributed  to  this  report,  and  represent 

dissemination  of  its  content  in  conference  settings: 

1.  Lospinoso,  J.  "Constrained  Communication  Patterns:  An  Empirical  Estimation  of  Actor 
Oriented  Social  Network  Behavior."  International  Conference  on  Information  &  Knowledge 
Engineering  Proceedings,  Las  Vegas,  Nevada  (2009).  To  Appear. 

2.  Lospinoso,  J.,  McCulloh,  L,  and  Carley,  K.  "Utility  Seeking  in  Complex  Social  Systems:  An 
Applied  Longitudinal  Network  Study  on  Command  and  Control."  Artificial  Intelligence  and 
Social  Behavior  Modeling  Proceedings,  Edinburgh,  Scotland.  (2009) 
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INTRODUCTION 


Current  applications  of  Social  Network  Analysis  (SNA)  can  be  partitioned  into  two 
broad,  non-exclusive  groups:  those  that  provide  descriptions  of  the  social  network  under  study 
and  those  that  provide  prescriptions  or  predictions.  This  report  explains  how  descriptions  of  the 
social  network  can  advise  predictions  on  it.  Typically,  SNA  models  are  constructed  based  upon 
some  theory,  and  sometimes  their  parameters  are  fit  to  empirical  data.  Often,  however,  the 
descriptive  statistics  from  this  estimation  serve  as  the  only  prescription  and  prediction  power 
from  the  analysis.  We  advocate  an  extension  of  this  analysis  into  the  realm  of  social  network 
simulation  to  harness  the  full  power  of  the  empirical  data  available. 

We  survey  the  most  popular  SNA  empirical  models  to  determine  which  models  are  most 
effective  at  describing  different  kinds  of  data.  We  then  survey  popular  simulation  methods  and 
underlying  theories  to  determine  how  the  SNA  empirical  models  can  be  used  to  advise  analysts 
on  how  to  craft  the  simulations.  Along  the  way,  we  create  a  suite  of  statistical  tools  which  future 
researchers  can  use  to  determine  the  best  analysis  workflow  for  their  particular  applications. 


A  Background  of  Social  Network  Modeling 

SNA  examines  relationships  between  social  entities  (e.g.  people,  groups,  tasks,  beliefs, 
knowledge,  etc.).  These  entities  are  modeled  with  nodes  or  vertices  and  their  connections  or 
relationships  are  modeled  with  edges.  Not  all  nodes  are  connected,  and  some  nodes  may  have 
multiple  connections.  This  mathematical  model  is  applicable  in  content  areas  such  as 
communications,  information  flow,  and  group  or  organizational  affiliation  (Tichy,  1979;  Wasserman, 
1994).  SNA  thus  relies  heavily  on  graph  theory  to  make  predictions  about  network  stmcture. 

Nodes  are  defined  in  tenns  of  a  set  of  ™  verticies,  ^  The  nodes  are  related 

to  each  other  with  a  set  of  edges  ^  where  'ii  is  a  relationship  between  node  and  A 

social  network  is  often  shown  as  an  adjacency  matrix,  where  the  rows  and  columns  correspond  to  the 
nodes  and  each  cell  Uij  can  take  on  any  numerical  value  corresponding  to  the  edge  In  an 
unweighted  network,  cells  are  Boolean  and  are  represented  as  0/1:  the  presence  or  absence  of  an  edge 
or  relationship  between  nodes  i  and  j.  Networks  where  relationships  between  nodes  are  always 
mutual  are  called  undirected  networks,  and  their  adjacency  matrices  will  always  be  symmetric. 
Directed  networks,  on  the  other  hand,  can  model  both  mutual  and  directional  relationships.  A  value 
of  1  in  cell  a,y  represents  a  directed  relation  from  node  i  to  node  j.  In  application,  the  diagonal  of  the 
adjacency  matrix  is  rarely  populated  with  anything  but  zeros,  since  interactions  from  an  entity  to 
itself  are  not  generally  interesting  in  a  social  network. 

The  potential  complexity  of  interactions  within  even  a  small  network,  while  discrete,  grows 
exponentially  with  the  number  of  entities.  For  this  reason,  algorithmic  approaches  to  exploring  state- 
spaces  within  constrained  networks  become  computationally  challenging.  In  a  directed  network,  the 
number  of  possible  relationships  among  nodes  can  be  found  by  the  following  expression,  where  n 
represents  the  number  of  nodes  in  the  network,  as  in  . 


1 


The  number  of  possible  configurations  (states)  of  a  network  with  a  specified  number  of  nodes  {n)  and 
edges  (e)  can  be  thought  of  as  the  number  of  unique  combinations  of  s  nodes  within  the  network: 


^„2_n  ^  {n^-n)\ 

s\-{n^  -  n  -  s)\ 

It  follows  that  the  total  number  of  possible  network  configurations  with  n  nodes  can  be  represented 
by  the  following: 


-  1 

For  example,  a  network  of  30  nodes  over  a  dichotomous  and  directed  relation  has  7.87  x  10^^'  unique 
states. 


To  understand  the  probability  of  network  structures  occurring,  the  degree  of  the  nodes  is 
often  investigated  (Albert,  2002;  McCulloh  et.  al.,  2007;  Borgotti  et.  al.,  2006).  The  degree  of  a 
node,  ki,  is  a  simple  network  measure  counting  the  number  of  edges  going  into/coming  out  of  a 
particular  node.  It  is  often  a  powerful  and  accurate  at  determination  of  who  holds  the  power  and 
influence  within  a  network  (Newman,  2007;  Casciaro  et.  al.,  1999).  If  we  accept  the  notion  that  a 
random  network  is  one  in  which  nodes  have  an  equal  and  unchanging  probability  to  have  a 
relationship  with  all  other  nodes  in  the  network,  random  networks  have  a  well  behaved  underlying 
distribution  of  degree  measures.  Both  the  degree  of  a  node  and  the  number  of  edges  in  a  network 
both  will  follow  a  binomial  distribution.  As  the  network  gets  arbitrarily  large,  the  distribution 
converges  to  a  Poisson  distribution. 

There  are  many  alternative  views  on  what  constitutes  a  random  network;  nevertheless, 
empirical  work  has  shown  that  social  networks  do  not  construct  themselves  in  the  image  of  a 
Binomial  random  graph  (Watts,  1998;  Barabasi,  2003).  Travers  (1969)  and  Milgram  (1967)  studied 
social  connections  in  the  United  States  and  discovered  surprisingly  short  path  lengths,  where  many 
strangers  were  connected  by  mutual  acquaintances.  This  was  termed  a  small-world  network.  A 
network  is  a  small  world  network  if  its  average  path  length  is  much  smaller  than  the  number  of  nodes 
in  the  network.  This  phenomenon  in  real-world  networks  is  popularly  known  as  “six  degrees  of 
separation”  (Guare,  1990).  Watts  and  Strogatz  (1998)  proposed  the  clustering  coefficient  as  a  graph 
level  measure  to  indicate  whether  a  graph  is  a  small-world  network.  The  clustering  coefficient  for  a 
directed  graph  is  defined  as. 


^jk 


where  A,  the  neighborhood  for  a  vertex  v,  and  is  defined  as  its  immediately  connected  neighbors. 

Intuitively,  this  clustering  coefficient  tells  us  how  dense  a  nodes’  neighborhood  is. 

The  degree  k,  of  a  vertex  is  the  number  of  vertices,  |A|  in  it’s  neighborhood  |A|.  Albert  and  Barabasi 
(2002)  review  current  methods  of  constructing  random  graphs  throughout  the  field  of  Network 


2 


Science  and  compare  the  degree  distribution,  clustering  coefficient,  and  average  path  length  of 
multiple  real-world  networks  with  various  types  of  random  networks.  They  find  that  real-world 
networks  have  a  higher  average  clustering  coefficient  and  a  shorter  average  path  length  than 
randomly  generated,  binomial  networks  with  the  same  number  of  nodes  and  edges.  Furthermore, 
they  show  that  several  networks  have  degree  distributions  that  follow  a  power-law  distribution, 
which  means  that  very  few  nodes  have  a  large  degree,  and  many  nodes  have  a  small  degree. 

All  of  these  models  observe  some  phenomena  in  nature  and  attempt  to  construct  some 
explanation  for  the  underlying  process  producing  the  observed  state.  Unfortunately,  it  is  difficult  to 
reverse-engineer  processes  and  validate  them  in  this  way.  This  paper  sets  out  to  survey  some  of  these 
models  (and  construct  a  new  one)  and  connects  them  with  simulation  packages  readily  available  to 
the  research  community.  In  doing  so,  we  bridge  the  gap  between  two  disparate  sections  of  social 
network  analysis. 


The  Way  Ahead 

This  paper  proceeds  by  first  creating  a  simple  SNA  model,  the  Link  Probability  Model,  in  the 
next  chapter.  The  following  chapter  compares  the  LPM  against  popular  competing  models.  After 
these  SNA  models  are  compared,  we  analyze  how  to  use  these  models  to  properly  parameterize 
Construct  (a  simulation  package),  and  then  test  its  performance.  In  the  last  chapter,  we  introduce  an 
advanced  topic  in  social  network  modeling  that  entails  computationally  intensive  statistical  methods 
and  comment  on  future  extensions. 
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THE  LINK  PROBABILITY  MODEL  (LPM) 

Barabasi  (2002)  proposed  the  scale-free  graph  which  creates  a  condition  on  the  random 
graph  that  the  degree  distribution  must  follow  a  power  law  distribution.  These  networks  were  shown 
to  resemble  some  real-world  networks.  While  scale-free  networks  may  appear  to  be  similar  to  real- 
world  networks  in  terms  of  structure,  they  are  not  a  sufficient  framework  to  truly  understand  the 
stochastic  nature  of  networks.  A  new  framework  for  random  networks  is  proposed,  based  upon 
empirical  data  collected  on  real-world  networks.  This  new  approach  produces  networks  that  have 
equivalent  properties  to  the  scale-free  networks;  however,  it  is  constructed  in  such  a  manner  as  to 
describe  the  close  relationships  between  some  nodes  and  distant  relationships  between  others.  This 
framework  holds  the  promise  of  a  new  line  of  research  to  explore  the  stochastic  behavior  of 
networks. 

The  Link  Probability  Model  posits  that  dynamic  networks  are  constructed  in  the  following  way: 
considering  each  dyadic  tie,  the  modeler  assigns  a  distribution  of  time  between  communications. 
Integrating  over  this  distribution  according  to  the  time  between  observed  networks  yields  adjacency 
matrices.  This  generation  process  defines  the  Link  Probability  Model.  Various  methods  can  be  used 
to  estimate  the  dyadic  distributions,  including  method  of  moments  and  maximum  likelihood. 
Alternately,  researchers  can  use  empirical  data  to  bootstrap  dyadic  distributions  or  simply  take 
averages  of  mean  time  between  communications.  The  following  sections  present  the  LPM  in  more 
formal  detail. 


Problem  Formulation 

Individuals  in  a  social  network  are  not  connected  to  other  individuals  with  uniform  random 
probability.  The  probability  structure  is  more  complex.  Intuitively,  there  are  some  people  whom  a 
person  will  communicate  with  or  be  connected  more  closely  than  others.  In  a  study  of  email 
communication  conducted  at  the  U.S.  Military  Academy  (McCulloh  et.  ah,  2007),  one  subject 
emailed  his  wife  more  than  ten  times  per  day  on  average,  while  other  people  that  he  worked  with 
received  an  email  from  him  once  or  twice  per  month.  For  this  reason,  real-world  networks  tend  to 
have  clusters  or  cliques  of  nodes  that  are  more  closely  related  than  others  (Newman,  2003;  Carley, 
1996;  Topper,  1999).  This  can  be  simulated  by  varying  the  probability  of  communication  between 
certain  nodes. 

Consider  a  group  consisting  of  15  individuals,  organized  into  three  subgroups.  Individuals 
within  each  subgroup  work  closely  together  and  communicate  more  frequently  than  they  do  with 
people  in  the  larger  group.  Each  day  individuals  may  communicate  with  others  in  the  group,  but 
most  likely  not  everyone.  If  we  suppose  that  an  individual  will  communicate  with  someone  in  their 
subgroup  with  probability  0.8  and  communicate  with  someone  outside  their  subgroup  with 
probability  0.2,  we  have  a  link  probability  model  (LPM)  shown  in  Figure  1. 

Using  this  LPM,  Monte  Carlo  simulation  was  used  to  generate  5000  instances  of  the  network. 
At  a  95%  significance  level,  the  confidence  interval  around  the  average  clustering  coefficient  was 
0.463  ±  0.0014  compared  with  0.329  ±  0.0024  in  a  random  graph  of  uniform  probability.  The  graph 
generated  with  the  LPM  has  a  clustering  coefficient  that  is  comparable  to  a  scale-free  graph  with  the 
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same  number  of  nodes  and  edges.  It  can  be  conjectured  that  the  clustering  coefficient  will  become 
greater  as  the  within  group  edge  probability  increases.  Furthermore,  as  the  probability  of  certain  key 
nodes  being  connected  to  others  increases,  the  degree  distribution  will  more  closely  follow  a  power 
law  distribution.  The  newly  proposed  random  network,  therefore,  achieves  equivalent  performance 
as  the  scale-free  network  in  modeling  real-world  networks,  yet  preserves  the  flexibility  to  model 
dyadic  relationships  between  nodes. 
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Figure  1.  Network  Probability  Matrix, 


The  edge  probabilities  can  be  derived  from  empirical  data  in  several  ways.  Given  network 
data  collected  over  multiple  time  periods  on  a  group  of  subjects,  the  edge  probabilities  can  be 
estimated  by  the  proportion  of  edge  occurrences,  ey,  for  each  cell  in  the  adjacency  matrix,  ay.  In  the 
case  of  communication  networks,  statistical  distributions  can  be  fit  to  the  time  between  messages  for 
each  potential  edge  in  the  network.  For  a  specified  period  of  time,  t,  the  edge  probability  p  for  each 
set  of  entities  i  and  j  can  be  found.  Let  xy  be  the  time  between  messages  in  a  communication  network. 
The  probability  density  function  for  any  x  can  then  be  defined  as/y  ( x  |  Oy),  where  9y  is  the  set  of 
parameters  for  the  distribution.  Then,  the  probability,  p,  of  an  edge  occurring  within  some  time 
period  t  is  the  probability  that  x  <  t,  which  can  be  expressed  as, 

p  =  I 

In  practice,  the  function ( x  |  Oy)  must  be  estimated  using  techniques  such  as  maximum  likelihood 
estimation  from  empirical  data  collected  on  the  group  being  studied.  It  may  be  desirable  to  constmct 
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a  network  based  on  a  restriction  such  as,  “two  emails  within  a  time  period  demonstrate  a  relationship, 
but  one  does  not.”  In  this  case,  it  is  necessary  to  compose  a  function  of  random  variables.  If 
represents  the  probability  density  function  of  time  between  two  sets  of  two  emails  and 
represents  the  probability  density  function  of  time  between  one  set  of  two  emails,  then  the  following 
is  true  under  certain  assumptions: 


It  is  possible  to  generalize  this  idea;  if  probability  that  x  or  more  communications 

occur  within  time  t,  then  the  following  is  true: 

This  newly  proposed  framework  for  viewing  the  probability  space  of  a  social  network  preserves  the 
same  flexibility  for  modeling  dyadic  relationships,  however,  it  provides  researchers  with  a  means  to 
understand  the  probability  space  of  the  network  and  thus  devise  more  robust  and  appropriate 
statistical  tests  for  social  network  analysis. 


Example  Problem  Solution 

Researchers  at  the  U.S.  Military  Academy  monitored  the  e-mail  traffic  of  24  mid-grade 
(senior  captains  and  junior  majors)  Army  officers  for  24  weeks  as  they  were  in  a  one  year  graduate 
program  at  Columbia  University.  Email  within  the  group  was  considered,  while  email  to  outside 
parties  was  thrown  out.  The  group  had  been  organized  with  a  formal  leadership  stmcture  among  the 
24  officers.  They  all  lived  on  the  West  Point  Military  Installation,  and  they  had  regular  social  events 
for  the  officers  and  their  families.  The  degree  distribution  followed  a  power  law  distribution  like  the 
social  networks  analyzed  by  Barabasi  and  Albert  (2002),  and  Newman  (2003).  The  time  between 
emails  for  each  possible  pair  of  nodes  was  calculated.  There  were  only  65  directed  pairs  of  nodes 
that  had  greater  than  30  messages  over  the  course  of  24  weeks.  Statistical  distributions  were  fit  to  the 
time  between  email  for  the  65  pairs  of  nodes.  All  of  them  followed  a  lognormal  distribution.  Figure 
2  shows  the  empirical  distribution  of  one  directed  pair  and  four  distributions  fit  to  the  data: 
exponential,  lognormal,  pareto,  and  zipf. 

One  could  conjecture  that  the  parameters  of  the  lognormal  distributions  may  be  dependent  upon 
various  social  factors,  such  as  formal  position  in  the  network,  friendship,  common  interest,  etc. 
Unlike  traditional  social  network  analysis,  using  the  LPM,  an  analyst  can  use  the  edge  probabilities 
as  dependent  variables  to  study  the  causes  of  relationships,  communication  frequency,  and  ultimately 
network  structure. 
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Data 


-  Zipf 


Lognormal - Exponential 


Figure  2,  Distributions  Fit  to  Time  Between  E-mails  in  Army  Officer  Study. 

Discussion 


A  new  approach  to  modeling  a  random  network  has  been  proposed  that  resembles  real-world 
networks,  preserves  dyadic  relationships,  and  can  be  estimated  from  empirical  data.  While  the 
approach  is  surprisingly  simple,  it  opens  the  door  for  many  new  analysis  opportunities  in  social 
network  analysis.  The  cell  entries  in  the  LPM  can  be  treated  as  dependent  variables,  while  various 
properties  describing  the  dyadic  relationships  between  nodal  pairs  can  be  used  as  independent 
variables.  This  will  reduce  variance  in  the  model  and  increase  the  coefficient  of  detemiination, 
thereby  explaining  the  complex  behavior  of  a  social  network  much  better  than  existing  methods. 

Other  research  building  from  this  new  approach  to  modeling  a  random  network  can  include 
building  empirical  distributions  of  social  network  measures.  This  newly  proposed  framework  allows 
analysts  to  randomly  generate  instances  of  social  networks  under  investigation.  Parameters  of 
distributions  for  social  network  measures  can  then  be  estimated  using  Monte  Carlo  simulation. 

Consideration  of  the  probability  space  of  entity  level  communications  is  imperative  for  many 
studies  of  social  networks.  Many  considerations  for  designing  social  experiments  rely  on  conventions 
within  the  field.  When  constructing  interaction  matrices,  experimenters  must  choose  many 
parameters  which  may  change  the  conclusion  of  the  study.  The  experimenters  of  the  U.S.  Military 
Academy  e-mail  study,  had  to  choose  how  many  emails  between  two  entities  demonstrate  a 
relationship  to  create  an  unweighted,  directional  network.  To  study  the  dynamics  of  the  network,  the 
experimenters  further  needed  to  determine  regular  intervals  to  sample,  which  allowed  for  a  temporal 
analysis.  By  instead  fitting  distributions  to  the  empirical  data,  experimenters  could  use  statistical 
techniques  to  manipulate  random  variables  and  sidestep  the  selection  of  the  potentially  influential 
aforementioned  parameters. 
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INTERFACING  NETWORK  SIMULATIONS  WITH  EMPIRICAL  DATA 


EMPIRICAL  VALIDATION  OF  THE  LPM 

Presently,  many  strueture-based  frameworks  are  used  in  the  network  science  community 
for  the  simulation  of  networks.  These  frameworks  are  based  on  the  presence  of  triads,  dyads, 
cliques  and  other  network  structural  components.  However,  these  frameworks  do  not  always 
consider  all  of  the  factors  that  contribute  to  the  dyadic  relationship  between  agents.  In  a  network, 
an  agent  may  not  be  influenced  by  the  occurrence  of  a  triad  between  two  other  agents  or  that 
certain  agents  in  the  network  have  dyadic  ties.  The  agent  is  mainly  concerned  with  his  own 
dyadic  relationships — leading  to  an  underlying  dynamic  equilibrium  in  the  network. 

This  dynamic  equilibrium  is  based  on  an  underlying  edge  probability  structure  that 
contains  a  probability  that  each  agent  will  communicate  with  every  other  agent  in  the  network. 
The  underlying  probability  structure  of  a  network  can  remain  independent  of  observations  at  any 
instance  in  time  and  be  constant  in  the  network  under  certain  assumptions  about  its  longitudinal 
nature  and  outside  factors.  A  single  observation  of  a  tie  does  not  necessarily  designate  a 
relationship  between  two  agents,  since  the  communication  could  have  been  made  spuriously.  On 
the  other  hand,  a  single  observation  of  the  lack  of  a  tie  does  not  designate  the  absence  of  a 
relationship — agents  are  not  continuously  communicating  with  every  agent  they  have  a 
relationship  with  at  every  instance  in  time.  While  a  snapshot  of  the  network  at  an  instance  in  time 
does  not  indicate  the  dyadic  relationships  between  agents,  this  snapshot  is  based  on  the 
underlying  network  probability  that  each  agent  will  communicate  with  every  other  agent. 

A  new  framework  is  proposed  for  the  simulation  of  networks  that  based  off  of  the 
underlying  probability  structure  of  the  dynamic  equilibrium.  This  framework  is  the  link 
probability  model  (LPM)  proposed  by  McCulloh,  Lospinoso,  and  Carley  (2007).  The  LPM 
estimates  the  edge  probabilities  for  each  dyadic  pairs  in  the  network.  Probability  estimation  can 
vary  from  a  proportion  of  communications  in  a  series  of  observations  or  be  estimated  from  more 
complex  distributions  depending  on  the  amount  and  type  of  data  present.  This  framework  and  be 
used  to  simulate  a  network  regardless  of  its  topology:  random,  small-world,  scale  free,  cellular, 
etc.  These  LPM  models  require  that  a  network  is  in  some  dynamic  equilibrium,  and  represents 
the  long  term  likelihoods  that  a  particular  dyad  is  observed  in  some  state. 

The  edge  probability  structure  of  the  underlying  dynamic  equilibrium  remains  constant  in 
the  network  while  the  network  is  at  a  stable  state.  However,  the  underlying  probabilities  may 
change  as  shocks  to  the  network  take  place.  These  probabilities  may  then  stabilize  as  the  network 
returns  to  it  dynamic  equilibrium.  Using  Monte  Carlo  simulation  over  an  LPM  will  yield  the 
underlying  distributions  of  network  measures  (assuming  that  the  network  is  in  dynamic 
equilibrium).  These  underlying  distributions  can  be  used  in  change  detection  and  allow  us  to 
statically  predict  shocks  to  the  network  and  determine  when  significant  changes  occur,  as  in 
McCulloh  (2009). 


Background 

Social  network  analysis  is  a  theoretical  framework  that  examines  the  relationships 
between  social  entities  (e.g.  people,  groups,  organizations,  beliefs,  knowledge,  ete.)-  These 
objects  are  known  as  nodes  and  their  connections  are  referred  to  as  edges.  Not  all  nodes  are 
connected;  however,  some  nodes  are  connected  with  multiple  relationships.  This  network 
framework  is  applieable  in  a  plethora  of  eontent  areas  sueh  as  communieations,  information 
flow,  and  group  or  organizational  affiliation  (Titchy  and  Tushman,  1979).  Social  network 
analysis  relies  heavily  on  graph  theory  to  make  predictions  about  network  structure. 


In  1959  mathematicians  Paul  Erdos  and  Alfred  Renia  made  revolutionary  diseoveries  in 
the  evolution  of  random  graphs.  In  their  eight  papers  Erdos  and  Renia  evaluate  the  properties  of 
random  graphs  with  n  vertices  and  m  edges.  Eor  a  random  graph  G  containing  no  edges,  at  each 

time  step  a  randomly  chosen  edge  among  the  (ii)  possible  edges  is  added  to  ^  .  This  graph 


(S)) 

contains  ^  edges  and  each  edge  of  the  \  N  ’  possible  edges  are  equiprobable.  Therefore,  once 

an  edge  is  chosen  from  the  equiprobable  edges  the  next  edge  is  chosen  among  the  remaining 

^  edges  and  this  process  is  eontinued  so  that  if  k  edges  are  fixed,  all  remaining  (2)”  ^ 
edges  have  equal  probabilities  of  being  chosen  (Erdos  &  Alfred  Renia,  1960).  A  general  model 
used  to  generate  random  graphs  is  as  follows  (Chung  &  Graham,  1998): 


Eor  a  given  p,  0  <  p  <  \,  each  potential  edge  of  G  is  chosen  with  probability  p, 
independent  of  other  edges.  Such  a  random  graph  is  denoted  by  Gn,p  where  each 
edge  is  determined  by  flipping  a  coin,  which  has  probability  p  of  coming  up 
heads. 


In  this  model  of  random  graphs  each  edge  has  an  equal  probability  of  oecurring  or  not  occurring 
within  the  graph.  This  random  graph  model  also  assumes  that  all  nodes  in  the  graph  are  present 
at  the  beginning  and  the  number  of  nodes  in  the  network  is  fixed  and  remains  the  same 
throughout  the  network’s  life.  Additionally,  all  nodes  in  this  model  are  eonsidered  equal  and  are 
undistinguishable  from  each  other  (Barabasi  &  Albert,  1999). 

Utilizing  Erdos’  theory  of  random  graphs  as  well  as  the  class  of  uniform  distributions 
associated  with  these  graphs,  Holland  and  Eeinheart  (1971)  developed  a  variety  of  statistical  tests 
for  the  analysis  of  social  networks.  Using  a  uniform  distribution  these  tests  spread  the  total 
probability  mass  equally  over  all  possible  outcomes,  therefore  giving  an  equal  probability  to  the 
existence  of  an  edge  between  any  two  nodes  in  the  network.  These  statistical  tests  were  used  to 
develop  a  reference  frame  or  constant  benchmark  to  which  observed  data  could  be  compared  to 
determine  how  “structured  a  particular  network  was,  or  how  far  the  network  deviated  from  the 
benchmark  (Wasserman  and  Eaust,  1994).” 

In  1969,  Mark  Granovetter  proposed  the  strength  of  weak  ties.  In  Granovetter’s  social 
world  our  close  friends  are  often  friends  with  each  other  as  well,  leading  to  a  society  of  small, 
fully  connected  circle  of  friends  who  are  all  conneeted  by  strong  ties.  These  small  circles  of 
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friends  are  eonneeted  through  weak  ties  of  aequaintanees.  In  turn,  these  aequaintances  have 
strong  connections  within  their  own  circle  of  friends.  The  weak  ties  connecting  circles  of  friends 
play  an  imperative  role  in  numerous  social  activities  from  finding  a  job  to  spreading  the  latest 
fad.  Close  friends  who  have  strong  connections  are  often  exposed  to  the  same  information, 
therefore,  weak  ties  are  activated  to  bridge  out  of  our  circle  of  friends  and  into  the  outside  world 
(Granovetter,  1973). 

Building  off  of  Granovetter’s  model  Duncan  Watts  and  Steven  Strogatz  (1998)  developed 
the  clustering  coefficient,  dividing  the  number  of  links  of  a  node’s  first  order  connections  by  the 
number  of  links  possible  between  these  first  order  connections.  This  clustering  coefficient 
illustrates  the  interconnectivity  of  a  circle  of  friends,  where  a  value  close  to  1  demonstrates  all 
first  order  connections  of  a  node  are  connected  with  each  other.  Conversely  a  value  close  to  0 
shows  that  a  nodes  first  order  connections  are  only  connected  through  that  node. 

The  Watts-Strogatz  model  of  small  world  networks  is  the  first  to  reconcile  clustering  with 
the  characteristics  of  random  graphs.  According  to  the  Watts-Strogatz  model  each  node  is 
directly  connected  to  each  one  of  its  neighbors  resulting  in  a  high  clustering  coefficient.  By 
clustering  alone,  this  model  has  a  high  average  path  length  connecting  two  random  nodes. 
However,  by  adding  only  a  few  random  links  between  nodes  of  different  clusters  the  average 
separation  between  nodes  drastically  decreases.  This  model  while  containing  random  links 
between  nodes  keeps  the  clustering  coefficient  relatively  unchanged  (Watts  &  Newman,  1999). 
While  the  Watts-Strogatz  model  originally  did  not  add  extra  links  to  the  graph  but  randomly 
rewired  some  of  the  links  to  distant  nodes  the  addition  of  random  links  was  proposed  by  Watts 
and  M.  Newman. 

According  to  Albert-Laszlo  Barabasi,  the  random  graph  theory  of  Erdos  and  Renia  was 
rarely  found  in  the  real  world.  Barabasi  has  found  that  many  real  world  networks  have  some 
nodes  that  are  connected  to  many  nodes  and  others  that  are  connected  to  few  nodes.  His 
empirical  tests  showed  that  the  distribution  of  the  number  of  connections  in  many  networks  all 
followed  a  power-law  distribution.  These  networks  lack  the  characteristic  scale  in  node 
connectivity  present  in  random  graphs,  and  therefore,  are  scale-free  (Barabasi,  2003).  As  a  result 
of  the  number  of  connections  following  a  power  distribution,  hubs  are  created  among  nodes  in 
the  network.  A  hub  is  a  highly  connected  node  that  contains  most  of  the  links  in  the  network  and 
creates  short  paths  between  any  two  nodes  in  the  network. 

Barabasi’s  model  of  scale-free  networks  is  constructed  around  preferential  attachment. 
For  each  time  step  a  new  node  is  added  to  the  network.  This  illustrates  the  principal  that 
networks  are  assembled  one  node  at  a  time  (Barabasi  &  Albert,  1999).  Assuming  that  each  new 
node  connects  to  the  existing  nodes  of  the  network  with  two  links,  the  probability  that  the  new 
node  will  choose  a  given  node  is  proportional  to  the  number  of  links  the  chosen  node  has. 
Therefore,  a  node  with  more  links  has  a  higher  probability  of  being  connected  to.  This  creates  a 
“rich  get  richer”  scenario  where  nodes  with  many  links  continue  to  grow  by  collecting  new  links 
while  newer  nodes  with  lower  degrees  do  not  collect  as  many  links  (Barabasi  &  Albert,  1999). 

Based  on  a  scale-free  network  model  where  nodes  make  connections  based  completely  on 
preferential  attachment  the  probability  that  a  new  node  will  connect  to  a  node  with  ^  links  is 
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fc 

given  by  (Barabasi,  2003).  This  causes  the  first  nodes  in  the  network  to  develop  into  hub 
nodes  due  to  having  the  longest  time  to  collect  links.  However  it  is  not  always  the  case  that  the 
first  nodes  in  a  network  develop  into  the  biggest  hubs. 

To  account  for  newer  nodes  overtaking  older  nodes  as  hubs,  Barabasi  constructed  the 
fitness  model.  Fitness  is  a  nodes  ability  to  collect  links  relative  to  every  other  node  in  the 
network  and  is  based  on  competition  in  complex  systems  (Barabasi,  &  Bianconi,  2001).  In  this 
new  model  a  node’s  attractiveness  is  not  determined  completely  by  its  number  of  links,  but 
preferential  attachment  is  driven  by  the  product  of  the  number  of  links  a  node  has  and  its  fitness. 
In  this  model  the  probability  a  new  node  will  connect  to  a  node  with  k  links  a  fitness  of  ^  is 
ki] 

"StkiTJi  (Barabasi,  &  Bianconi,  2001).  Nodes  in  this  model  acquire  links  following  the  power  law 
distribution  of  the  scale-free  model,  however,  the  dynamic  exponent  ^  — ^which  determines  how 
vast  a  node  acquires  new  links — is  different  for  each  node.  This  is  proportional  to  a  node’s 
fitness,  therefore,  a  node  that  is  twice  as  fit  as  another  node  will  obtain  nodes  twice  as  fast 
because  its  dynamic  exponent  is  twice  as  large.  This  “fit-get-rich”  model  allows  nodes  to  become 
hubs  based  on  their  attractiveness  regardless  of  when  they  enter  the  network  (Barabasi,  & 
Bianconi,  2001). 

Contrary  to  the  scale-free  network  model,  Barabasi,  developed  the  “winner  take  all 
model,”  which  strongly  portrays  monopolies.  The  “winner-take-all-model”  consists  of  a  single 
hub  and  many  tiny  nodes.  This  network  develops  a  star  topology  and  nodes  do  not  acquire  links 
following  a  power  law  distribution.  McCulloh  and  Lospinoso  (2007)  proposed  a  new  framework 
for  random  communication  networks  over  time,  based  on  empirical  data  collected  on  real  world 
networks.  This  new  framework  estimates  distributions  for  the  time  between  communication 
messages,  then  based  on  a  given  time  interval  the  probability  of  an  edge  occurring  in  the  network 
is  calculated  for  every  ordered  pair  of  nodes.  These  probabilities  can  be  constructed  through 
multiple  techniques.  To  derive  the  probabilities  from  empirical  data  collected  over  several  time 
periods,  a  proportion  of  edge  occurrences  can  be  used  to  estimate  probabilities  for  each  cell  in 
the  adjacency  matrix. 

These  probabilities  are  displayed  in  a  network  probability  matrix  where  each  cell  is  the 
probability  that  node  i  communicates  with  node  j.  This  frame  work  is  capable  of  generating 
networks  that  are  similar  to  scale  free  networks.  Thus,  this  model  can  be  used  to  construct  any 
network  topology:  Erdos-Renia  random,  Watts-Strogatz  small  world,  Albert-Barabasi  scale-free, 
star,  cellular,  ect.  The  McCulloh-Lospinoso  model  is  estimated  from  empirical  data  and  can  be 
used  to  simulate  realistic  observations  of  relationships  in  specific  organizations. 


Data 


This  research  evaluates  the  density  of  two  real  world  networks  to  find  the  underlying 
distribution  of  network  density.  The  first  data  set  was  collected  from  a  war  fighting  simulation  in 
FT  Leavenworth,  KS  in  April  2007  by  Craig  Schreiber  and  Lieutenant  Colonel  John  Graham. 
There  were  99  participants  in  the  experiment  that  were  monitored  over  the  course  of  four  days 
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while  data  was  being  colleeted.  A  set  of  68  partieipants  served  as  staff  members  in  the 
headquarters  of  the  brigade  eondueting  the  exereise.  The  data  displays  the  interaetions  of  agents 
in  a  network  whieh  was  eolleeted  by  monitoring  eommunieations  throughout  the  simulation. 

The  seeond  data  set  is  from  a  war  fighting  simulation  in  FT  Leavenworth,  KS  in  2005, 
also  eolleeted  by  Craig  Sehreiber  and  Lieutenant  Colonel  John  Graham.  This  data  set  eontains 
156  agents  that  were  monitored  over  the  eourse  of  nine  iterations  of  the  simulation.  This  data 
exhibits  the  communieation  agents  in  the  network  that  was  eolleeted  by  monitoring 
eommunieations  throughout  the  simulation.  For  the  duration  of  this  ehapter,  the  Ft.  Leavenworth 
2007  Data  will  be  referred  to  as  Network  1  and  the  Ft.  Leavenworth  2005  data  sets  will  be 
referred  to  as  Network  2. 


Method 

This  researeh  explores  the  distribution  of  the  density  measure  in  two  simulated  networks 
using  the  network  probability  matrix.  To  simulate  the  network,  it  is  neeessary  for  a  link 
probability  model,  (LPM)  to  be  ereated.  Onee  the  datasets  for  Network  1  were  trimmed  of  the 
seripted  agents,  they  were  symmetrized  aeross  the  main  diagonal  in  the  Organizational  Risk 
Analyzer  (ORA)  to  aceount  for  the  laek  of  direetionality  of  eommunication  in  the  data. 
Symmetrizing  the  data  also  eorrects  for  the  informant  error  of  agents  not  reporting  other  agents 
they  have  eommunieated  with.  Next,  the  datasets  from  Ft.  Levenworth  2007  were  diehotomized 
to  remove  the  weighting  set  by  the  partieipants.  Onee  the  data  is  diehotomized  a  one  represents 
eommunieation  between  two  agents  and  a  zero  represents  the  laek  of  communieation  between 
two  agents.  To  construct  the  LPM  all  eight  data  sets  were  compiled  into  a  single  data  set 
consisting  of  the  total  number  of  discrete  time  periods  that  each  agent  communicated  with  each 
other  agent.  This  matrix  was  then  divided  by  the  number  of  discrete  time  periods  to  determine 
the  underlying  edge  probabilities  for  the  network  in  dynamic  equilibrium. 

The  Network  2  data  sets  were  collected  as  unweighted  data  so  they  did  not  have  to  be 
dichotomized.  It  was  also  unnecessary  to  trim  these  data  sets.  The  nine  data  sets  from  this 
network  were  symmetrized  across  the  main  diagonal  in  ORA  to  correct  for  informant  of  agents 
not  reporting  other  agents  they  have  communicated  with.  To  construct  the  LPM  all  nine  data  sets 
were  compiled  into  a  single  data  set  consisting  of  the  total  number  of  discrete  time  periods  that 
each  agent  communicated  with  each  other  agent.  This  matrix  was  then  divided  by  the  number  of 
discrete  time  periods  to  determine  the  underlying  edge  probabilities  for  the  network  in  dynamic 
equilibrium. 

The  LPMs  were  then  used  as  the  edge  probabilities  for  Monte  Carlo  simulations  of  these 
two  networks.  In  these  simulations  a  random  number  was  generated  for  each  edge.  If  the  random 
number  is  less  than  the  edge  probability  then  the  edge  is  added  to  the  graph.  This  algorithm  was 
used  to  create  100,000  instances  of  the  network.  When  100,000  instances  of  the  network  were 
completed  the  average  density  was  taken  from  each  simulation  to  create  a  dataset  of  100,000 
network  densities  for  each  network. 
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To  analyze  the  reliability  and  eonsisteney  of  our  simulations  hamming  distanees  were 
utilized  as  a  metrie  for  the  differenees  between  two  binary  adjaeeney  matriees.  Using  the  LPM, 
60,000  instanees  of  eaeh  network  were  simulated.  The  average  hamming  distanee  from  eaeh 
empirical  data  set  to  every  other  empirical  time  step.  Next,  each  simulated  network  is  differenced 
in  the  same  manner  against  each  empirical  time  step.  These  average  hamming  distances  were 
then  analyzed  using  a  paired  t-test.  The  results  of  this  test  indicate  whether  the  LPM  predicts  an 
instance  of  the  empirical  network  with  more  or  less  error  than  the  error  introduced  by  the 
dynamic  equilibriums  temporal  fluctuations. 

The  normal  distribution  was  fit  to  the  data  of  each  network  using  Maximum  Likelihood 
Estimation.  An  Anderson-Darling  goodness  of  fit  test  and  a  comparison  of  the  estimated 
cumulative  distribution  function  to  the  data’s  empirical  distribution  function  indicated  a  very 
good  fit  for  the  data.  In  addition,  since  the  density  is  a  linear  function  of  the  average  node  degree, 
the  central  limit  theorem  would  suggest  that  the  density  is  normally  distributed  for  each  network. 

Using  the  paired  t-test,  it  is  illustrated  that  the  networks  simulated  using  the  LPM  have  a  smaller 
average  hamming  distance  to  the  empirical  data  sets  than  each  empirical  data  set  is  to  each  other. 
This  is  evidence  that  the  simulated  networks  give  a  more  reliable  and  consistent  approximation 
of  the  underlying  distribution.  The  results  of  the  paired  t-test  for  both  networks  are  shown  below 
in  Table  1  and 

Table  2  respectively.  In  each  table  column  one  is  the  average  hamming  distance  from 
each  empirical  data  set  to  every  other  empirical  data  set  and  column  three  is  the  average 
hamming  distance  from  60,000  networks  simulated  with  the  LPM  to  each  of  the  empirical  data 
sets. 


Table  1.  Paired  t-test  of  Average  Hamming  Distances  for  Network  1  Data. 


M 

5 

N 

60000 

e  mean 

e_stdev 

s  mean 

s_stdev 

t-val 

P 

409.2857 

38.5604 

358.0939 

12.77466 

3.754923 

0.00 

365.8571 

18.2978 

320.0974 

12.7394 

7.073195 

0.00 

365.8571 

29.04266 

320.1638 

12.79331 

4.449958 

0.00 

377.8571 

38.24669 

330.6744 

12.77289 

3.489244 

0.00 

375.2857 

36.10039 

328.3765 

12.79551 

3.675254 

0.00 

349.8571 

38.15944 

306.0783 

12.7845 

3.244918 

0.00 

373.8571 

48.45076 

327.0728 

12.82622 

2.731135 

0.01 

362.4286 

55.63529 

317.1509 

12.77754 

2.301849 

0.02 
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The  p-value  of  each  test  is  approximately  zero  indicating  that  there  is  a  statistically 
significant  difference  between  the  empirical  hamming  distances  and  the  simulated  hamming 

distances.  Additionally,  since  f^simuMed  ^  ^  it  is  shown  that  the  simulated  networks 

have,  on  average,  less  Hamming  distance  from  each  of  the  empirical  data  sets  than  the  empirical 
data  sets  have  from  each  other. 


Table  2.  Paired  t-test  of  Average  Hamming  Distances  for  Network  2  Data 


M 

5 

N 

60000 

e  mean 

e_stdev 

s  mean 

s_stdev 

t-val 

P 

1445.000 

84.774 

1284.338 

23.747 

3.467 

0.001 

1394.750 

67.487 

1239.647 

23.703 

3.765 

0.000 

1296.125 

85.436 

1151.946 

23.671 

3.2S1 

0.001 

1315.875 

153.533 

1169.665 

23.718 

2.421 

0.015 

1191.250 

112.324 

1058.990 

23.667 

2.732 

0.006 

1204.875 

207.944 

1071.116 

23.623 

1.912 

0.056 

1167.375 

190.431 

1037.713 

23.695 

1.980 

0.048 

1159.625 

204.465 

1030.815 

23.732 

1.888 

0.059 

1170.125 

195.266 

1040.142 

23.618 

1.953 

0.051 

This  test  shows  that  if  you  select  one  of  the  empirical  adjacency  matrix  there  is  more 
error  in  predicting  it  from  the  remaining  empirical  data  sets  then  from  predicting  it  with  the 
LPM.  Once  the  reliability  and  consistency  of  the  simulations  created  using  the  LPM  were 
confirmed,  the  distribution  of  the  density  could  be  determined.  Since  density  is  a  linear  function 
of  a  sample  average  of  a  network  statistic  and  the  sample  sized  is  greater  than  30  for  each 
network  the  central  limit  theorem  can  be  used  to  determine  that  the  underlying  distribution  of 
network  density  is  the  normal  distribution,  with  p=0. 00396148  and  a=0. 0984374  for  Network  1 
and  p=0. 0476886  and  a=0. 000972361  for  Network  2.  This  is  also  shown  in  Figure  2  and  Figure 
3. 
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Cumulative  Probability 


Figure  2,  Stepwise  Plot  of  Density  Data  for  Network  1  and  CDF  of  the  Normal  Distribution 


Figure  3,  Stepwise  Plot  of  Density  Data  for  Network  2  and  CDF  of  the  Normal  Distribution 
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Each  graph  shows  the  stepwise  plot  of  the  100,000  densities  overlaid  with  the  CDF  of  the 
normal  distribution.  The  sum  of  squared  error  of  this  model  for  network  1  is  9.60609  and  the 
sum  of  squared  error  of  this  model  for  Network  2  is  1.41659.  While  these  terms  have  no  absolute 
interpretation,  we  ean  eonfirm  upon  visual  inspeetion  of  Figures  2  and  3  that  the  data  is  elosely 
fit  by  a  normal  distribution. 

A  histogram  of  the  densities  for  Network  1  and  Network  2  are  shown  in  the  figure  below 
in  Figure  4: 

Histogram  of  Density  for  Network  Histogram  of  Density  for  Network 

Histogram  of  Density  Histogram  of  Density 


j  Density 

I  Density  ^ 

Figure  4,  Histograms  of  Density 

It  is  shown  in  Figure  3  that  the  densities  of  Network  1  and  Network  2  both  fit  a  normal 
eurve.  This  further  reinforees  that  the  densities  for  both  Network  1  and  Normal  2  follow  a 
normal  distribution.  Additional  Normality  tests  can  be  seen  below  in  Figure  5,  where  the  box- 
plots  indieate  normal  dispersion  of  data  about  the  quartiles,  and  the  qq-normal  plots  near- 
linearity  indieates  normality: 
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Box  Plot  of  Densities  for  Network 


Box  Plot  of  Densities  for  Network 


j  Theoretical  Quantiles  2  Theoretical  Quantiles 

Figure  5,  Additional  Tests  for  Normality 

This  research  validates  the  use  of  the  LPM  for  simulating  networks  based  on  empirical 
data.  The  LPM  provides  a  reliable  and  consistent  network  simulation  that  is  a  strong  framework 
for  analysis.  This  research  can  be  extended  in  at  least  three  aspects:  assessing  the  underlying 
distribution  for  agent  level  statistical  measures,  assessing  the  underlying  distribution  for  other 
network  level  statistical  measures,  and  using  these  distributions  to  statistically  predict  changes 
and  shocks  to  a  network. 
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INTERFACING  NETWORK  SIMUEATIONS  WITH  EMPIRICAE  DATA 


SIMULATING  SOCIAL  NETWORKS  WITH  CONSTRUCT 

Recently,  a  great  deal  of  literature  has  been  focused  on  methods  for  simulating  network 
structure.  Simulation  offers  a  number  of  advantages  to  the  researcher.  First,  we  can  use 
simulation  to  emulate  the  behavior  of  individuals  and  predict  behavior  over  time.  For  example, 
when  analyzing  data  over  time  (longitudinal  data),  real  world  data  at  time  1  can  be  used  to 
initialize  the  simulation  program.  The  simulation  can  then  be  used  to  predict  data  at  time  2. 
Second,  to  the  extent  that  such  predictions  are  accurate,  we  can  use  the  simulation  to  do 
hypothetical  "what  if  analyses.  For  example,  we  can  use  the  simulation  program  to  examine 
alternate  hypothetical  societies  to  see  what  differences  in  such  societies  might  be  necessary  to  get 
a  different  outcome  than  that  perceived  in  the  real  data. 

The  value  of  such  an  exercise,  is  not  that  it  proves  why  the  group  or  society  changed  as  it 
did,  but  that  such  an  exercise  provides  a  way  of  reasoning  about  the  situation,  and  enables  the 
researcher  to  create  more  informed  hypotheses  that  can  then  be  empirically  tested.  In  sociology, 
as  we  move  to  dynamic  models  with  feedback  we  will  find  that  they  capture  more  of  the  social 
situation,  but  that  it  is  incredibly  difficult  for  the  researcher  to  think  through,  without  mistakes, 
the  implications  of  such  models.  Simulation  becomes  a  tool  for  increasing  the  specificity  of 
theory,  thinking  through  the  theoretical  implications,  and  generating  testable  predictions. 

In  this  chapter,  we  provide  an  overview  of  several  competing  methods  of  network 
simulation.  Differences  and  similarities  are  identified.  The  link  probability  model  (LPM)  is 
briefly  illustrated  and  we  identify  why  it  is  in  many  cases  favorable  to  the  exponential  random 
graph  (ERG)  model.  We  then  move  on  to  summarize  Construct  and  its  roots  in  constructural 
sociological  theory.  We  discover  that  the  (LPM)  provides  a  mathematical  bridge  between 
empirically  observed  data  and  the  multi-agent  simulation.  Construct,  which  is  based  on 
constructuralist  theory.  Construct,  in  turn,  introduces  additional  relational  dependence  into  the 
LPM  correcting  for  its  naive  assumption  of  independence.  Finally,  we  depict  how  this 
sociological  theory  translates  into  the  LPM,  how  Construct  leverages  the  LPM,  and  relate  the 
results  of  empirical  studies  conducted  by  others  on  the  effectiveness  of  Construct  vice  other 
alternatives. 


Exponential  Random  Graph  Models 

ERG  models  are  used  in  social  network  analysis  as  statistical  models  that  enable  an 
analyst  to  conduct  inference  on  dependent  relational  data  (Goodreau,  2007;  Robins,  et.  ah,  2007). 
The  ERG  model  is  therefore  less  restrictive  than  earlier  models  for  social  networks  that  assumed 
dyadic  independence  (Holland  and  Leinhardt,  1981).  In  many  social  network  applications  the 
relationship  between  two  individuals  depends  on  relationships  between  the  individual  and  others 
in  the  network,  cognitive  limits  on  the  number  of  relationships  that  can  be  maintained,  similarity 
between  individuals,  and  more.  The  ERG  model  framework  for  relaxing  the  dyadic 
independence  assumption  is  thus  essential  for  accurate  inference  in  many  data  sets. 
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Estimating  ERG  model  terms  and  parameters  ean  be  computationally  challenging  in  large 
networks  (Snijders,  2002;  Pattison  and  Robins,  2002).  Markov  Chain  Monte  Carlo  estimation  of 
ERG  models  has  been  used  to  fit  these  models  to  data  (Goodreau,  2007;  Robins,  et.  ah,  2007; 
Handcock,  2003,  2002;  Snijders,  2002;  Pattison  and  Robins,  2002).  The  Markov  dependence  in 
these  models  leads  to  problems  of  degeneracy,  which  is  discussed  in  detail  by  Handcock 
(Handcock,  2003,  2002).  Essentially,  model  degeneracy  occurs  when  the  observed  data  is 
almost  impossible  under  the  specified  model.  This  often  occurs  when  explanatory  terms  are 
highly  correlated  and  there  is  insufficient  data  to  construct  an  appropriate  model.  Several 
advances  in  ERG  models  have  been  proposed  to  include  curved  exponential  family  models 
(Hunter  and  Handcock,  2006)  and  neighborhood  models  (Robins,  et.  ah,  2005).  It  is  not  clear 
that  these  advances  have  completely  removed  issues  of  model  degeneracy,  however. 


Link  Probability  Model 

The  EPM  (McCulloh  &  Eospinoso,  2007)  has  been  proposed  as  an  alternative  model  to 
the  ERG  model.  The  EPM  framework  for  viewing  the  probability  space  of  a  social  network 
avoids  issues  of  model  degeneracy,  while  preserving  flexibility  for  modeling  dyadic 
relationships.  It  provides  researchers  with  an  improved  means  to  understand  the  probability 
space  of  the  network,  under  certain  conditions.  The  EPM  is  a  square  matrix  where  the  rows  and 
columns  correspond  to  the  nodes  in  a  social  network.  The  entries  are  the  link  probabilities  of  the 
directed  link  from  the  row  node  to  the  column  node.  This  is  not  to  be  confused  with  an  adjacency 
matrix,  where  the  entries  are  either  zero  or  some  number  representing  the  strength  of  a 
relationship  between  nodes.  The  link  probability  is  a  number  between  0  and  1,  and  determines 
the  likelihood  of  a  link  being  present  in  an  observed  adjacency  matrix. 

The  link  probabilities  can  be  derived  from  empirical  data  in  several  ways.  Given  network 
data  collected  over  multiple  time  periods  on  a  group  of  subjects,  the  link  probabilities  can  be 
estimated  by  the  proportion  of  link  occurrences,  e{i,j),  for  each  cell  in  the  adjacency  matrix, 
a{i,j).  In  the  case  of  communication  networks,  statistical  distributions  can  be  fit  to  the  time 
between  messages  for  each  potential  link  in  the  network.  Eor  a  specified  period  of  time,  t,  the 
link  probability  p  for  each  set  of  entities  i  and  j  can  be  found  by  integrating  over  the  probability 
density  function  from  0  to  t. 

Relational  dependence  in  link  probabilities  are  accounted  for  in  the  EPM  by  the  historic 
presence  of  links.  Relational  dependence  in  links  can  occur  for  many  reasons.  One  example  is  if 
a  boss  sends  an  email  to  two  employees  telling  them  to  work  on  a  project,  it  will  affect  the 
probability  of  communication  between  the  two  employees.  The  EPM  does  not  modify  the  link 
probability  based  on  these  perceived  factors  that  may  adjust  the  probability  of  two  nodes  having 
a  relationship.  The  EPM  accounts  for  the  relational  dependence,  by  assuming  that  it  will  be 
inferred  by  the  historic  presence  or  absence  of  links  between  nodes.  If  a  boss  often  gives  a  task 
to  two  employees,  then  the  presence  of  a  link  between  the  employees  is  likely  to  be  more 
common  when  observing  past  networks.  This  does  not  account  for  all  of  the  relational 
dependence  in  the  network.  To  introduce  a  realistic  degree  of  dependence,  the  EPM  would  need 
to  be  modified  at  each  time  step  based  on  social  theory  established  in  the  literature. 
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Construct  for  Multi-Agent  Simulations 

Construct^  is  a  multi-agent  simulation  grounded  in  eontrueturalist  theory  (Carley,  1990; 
Carley,  1995).  The  LPM  provides  the  stoehastie  engine  for  the  multi-agent  simulation.  At  eaeh 
time  step  the  link  probabilities  are  determined  by  the  nodes’  pereeived  homophily,  soeio- 
demographies,  and  proximity.  These  soeial  faetors  re-introduee  the  additional  relational 
dependenee  missing  in  the  raw  LPM. 

Construct  is  a  dynamie-network  multi-agent  simulation  model  that  ean  be  used  to 
examine  the  evolution  of  soeial,  knowledge  and  aetivity  networks  in  response  to  external 
interventions  and  the  normal  eourse  of  human  interaetion  (Carley,  1990;  Carley  1991).  Network 
evolution  and  the  diffusion  of  information  and  beliefs  through  soeial  networks  ean  be  examined 
using  Construct  (Carley,  1995;  Hirshman  &  Carley,  2007b,  Hirshman,  Martin  &  Carley,  2008). 
Construct  eaptures  group  dynamie  dynamies  under  diverse  eultural  and  teehnologieal 
eonfigurations  (Sehreiber  &  Carley,  2004).  Consequently,  organizational  ehange  (Carley  &  Hill, 
2001),  soeio-eognitive  ineonsisteneies  (Carley  &  Kraekhardt,  1996),  the  impaet  of 
eommunieation  teehnologies  (Carley,  1995;  Carley  2002)  ean  be  tested  with  Construct.  To  use 
Construct  the  researeher  speeifies  both  the  agents  replete  with  information  proeessing 
eapabilities  (Hirshman,  Carley  &  Kowalehuk,  2007a)  and  the  networks  in  whieh  they  are 
embedded  (Hirshman,  Carley  &  Kowalehuk,  2007b). 


Constructuralism 

Before  we  explore  the  ability  for  network  simulation  to  represent  reality,  we  must  first 
lay  the  foundational  theory  behind  eonstrueturalism  as  it  applies  to  the  multi-agent  simulation 
Construct.  Advanees  in  both  eognitive  seienee  and  network  theory  have  engendered  the  belief 
that  it  should  be  possible  to  develop  analytieal  models  of  the  relationships  between  individuals 
that  would  enable  quantitative  predietions  of  ehanges  in  interaetion  and  that  take  into  aeeount 
both  the  self  and  the  soeiety,  the  individual  and  the  group,  the  eognitive  and  the  soeial.  These 
advanees  have  renewed  interest  originally  seen  in  soeial  eomparison  theory  (Festinger,  1954), 
eognitive  dissonanee  theory  (Festinger,  1957),  and  balanee  theory  (Heider,  1958),  that  it  is 
possible  to  build  a  mathematies  of  group  ehange  as  a  funetion  of  individual  ehange.  It  also  posits 
that  there  is  a  gap  between  eognitive  and  individual  perspeetives;  ehanges  in  relationships 
between  individuals  result  from  independent  dyadie  eneounters.  Soeial  and  struetural  perspeetive 
ehanges  alter  relationships  between  individuals.  Currently  a  great  deal  of  researeh  is  direeted  at 
bridging  this  gap.  On  the  individual  side  the  linking  of  symbolie  interaetionism  and  role  theory 
ean  be  viewed  as  a  move  to  ineorporate  soeial  or  group  faetors  into  an  otherwise  predominantly 
eognitive. 

Similarly,  affeet  eontrol  theory  is  a  move  to  ineorporate  the  soeial,  in  terms  of  task 
eonstraints  and  soeial  knowledge,  into  a  eognitive  and  affeetive  model  of  the  individual's 
evaluation  of;  and  henee  determination  of  future  aetion  (Heise  1971,  1979,  1987;  Smith-Lovin 
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The  Construct  system  itself  is  freely  downloadable  from  the  CASOS  website,  http://www.casos.cs.cmu.edu/projects/construct 
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1987).  The  focus  on  the  change  in  the  individual  or  his  or  her  relationships  to  an  actual  or  a 
generalized  other,  treats  the  group  or  social  world  as  present,  but  relatively  fixed.  This  implicitly 
assumes  that  social  or  group  behavior  is  somehow  an  aggregate  of  the  results  of  independent 
encounters  between  pairs  of  individual.  This  last  assumption  is  not  exclusive  to  those  who 
propose  more  cognitively  rich  models  of  behavior. 

For  example,  we  also  see  it  in  the  work  on  status  and  dominance  where  hierarchies  are 
viewed  to  result  from  independent  dyadic  encounters  (Berger,  Conner,  and  Fisek  1974;  Rosa  and 
Mazur  1979;  Lamb  1986).  On  the  up  side,  evidence  is  being  amassed  that  group  behavior  cannot 
be  accounted  for  by  aggregating  independent  dyadic  encounters  (Chase  1974,  1980;  Ridgeway 
and  Diekema  1989)  but  is  rather  an  emergent  property  of  the  simultaneous  actions  of  all  group 
members  (Bales  1950;  Homans  1950;  Chase  1974,  1980;  Fararo  and  Skvoretz,  1986).  The 
mechanism  by  which  such  group  behavior  emerges  remains  elusive.  As  a  step  toward  locating 
this  mechanism,  research  in  the  structural  and  network  traditions  has  been  moving  toward 
providing  explanations,  and  hence  predictions,  of  individual  cognitive  change  in  terms  of  the 
individual's  social  position. 

This  can  be  seen  in  Burt's  model  of  action  (1982)  where  perceived  similarity  and  hence 
norms,  attitudes,  likelihood  of  adopting  innovations,  and  so  on  is  a  function  of  social  position. 
This  is  further  supported  by  Krackardfs  notion  (1985,  1986,  1987)  that  the  individual's  social 
cognition  (which  he  defines  as  the  individual's  perception  of  who  interacts  with  whom)  is  a 
function  of  social  position.  These  works  reveal  a  more  cognitive  actor  than  that  revealed  by 
classic  structuralist  whose  behavior  is  nonetheless  socially  situated.  Yet,  like  the  more  cognitive 
individual  models,  these  social  models  of  individual  change,  still  focus  on  the  change  in  the 
individual  while  maintaining  a  relatively  fixed  social  world.  Thus,  both  the  individual  and  the 
social  perspectives  treat  the  social  world  as  fundamentally  stable.  Consequently,  neither 
perspective  provides  a  mechanism  by  which  such  individual  changes  can  produce  social  change. 
Neither  approach  is  sufficient  to  explain,  let  alone  quantitatively  predict,  changes  in  the 
interaction  patterns  for  all  members  of  the  society  at  once.  Rather,  the  explanations  of  social 
change  are  highly  contextual  relying  on  situation  specific  factors,  forces,  and  constraints  such  as 
goals,  coercion,  bureaucratization,  change  in  group  size,  and  membership  rituals. 

Every  group  has  a  population  consisting  of  some  number  of  individuals.  In  every  group, 
there  is  a  set  of  information  or  facts  that  is  potentially  learnable  by  the  members  of  the  group. 
This  set  of  information  contains  each  piece  of  information  that  is  known  by  at  least  one  group 
member.  The  number  of  such  facts  will  be  denoted  by  K.  The  individual,  for  any  piece  of 
information,  such  as  k,  either  knows  that  fact  or  does  not.  This  is  denoted  by  F  {t)  =  \  (where  t 
denotes  time)  if  the  fact  is  known  by  individual  at  time  period  t  and  0  otherwise. 

Every  society  has  a  culture,  which  can  be  thought  of  as  the  distribution  of  information 
across  the  population.  At  a  particular  point  in  time,  say  time  period  t,  an  individual  i  has  a  certain 
probability  to  interact  with  another  other  member  of  the  society,  y.  This  is  exactly  where  the  LPM 
comes  into  consideration.  Every  society  has  a  social  structure,  which  can  be  thought  of  as  the 
distribution  of  interaction  probabilities  across  the  population.  The  initial  make-up  of  these 
probabilities  and  the  transition  of  these  probabilities  at  different  time  points  are  thus  determined 
by  several  factors. 
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Construct  and  Constructuralism 


The  first  assumption  of  the  Construct  model  posits  that  interaetion  leads  to  shared 
knowledge.  It  is  generally  demonstrable  that  individuals  aequire  information  (and  henee  will 
eome  to  share  knowledge)  during  interactions.  To  represent  this  process,  a  variety  of  simplifying 
assumptions  are  made.  All  pieces  of  information  are  entirely  unstructured  and  undifferentiated. 
The  individual  may  know  conflicting  information  such  as  the  sky  is  blue  and  the  sky  is  green. 
Consequently,  the  overlap  in  what  two  individuals'  know  is  just  the  sum  of  the  pieces  of 
information  that  they  both  know.  When  two  individuals  interact,  each  communicates  one  fact  to 
the  other.  Individuals  always  learn  the  piece  of  information  that  is  communicated  to  them. 
Consequently,  if  individual  i  knows  that  the  sky  is  blue  and  individual  j  knows  that  the  sky  is 
green  and  individual  j  communicates  to  individual  i  that  the  sky  is  green,  the  overlap  in  their 
knowledge  increases.  Hence  they  have  more  shared  knowledge.  All  facts  known  by  the 
individual  are  equally  likely  to  be  communicated. 

According  to  constructuralism,  both  the  individual  cognitive  world  and  the  socio-cultural 
world  are  continuously  constructed  and  reconstructed  as  individuals  concurrently  go  through  a 
cycle  of  action,  adaptation,  and  motivation.  During  this  process  not  only  does  the  socio-cultural 
environment  change,  but  social  structure  and  culture  co-evolve  in  synchrony.  Carley  (1991a) 
defined  the  following  primary  assumptions  in  describing  constructuralism:  individuals  are 
continuously  engaged  in  acquiring  and  communicating  information,  what  individuals  know 
influences  their  choices  of  interaction  partners,  and  an  individual's  behavior  is  a  function  of  his 
or  her  current  knowledge.  In  addition  to  these  primary  assumptions  there  were  a  series  of  implicit 
assumptions  that  upon  explication  serve  to  clarify  and  expand  the  primary  assumptions. 
Following  is  an  expanded  list  of  assumptions,  numbered  to  clarify  their  relation  to  the  primary 
assumptions: 

la.  Individuals,  when  interacting  with  other  individuals,  can  communicate  information. 

lb.  Individuals,  when  interacting  with  other  individuals,  can  acquire  information. 

lc.  Individuals  can  learn  the  newly  acquired  information,  thus  augmenting  their  store  of 
knowledge. 

2a.  Individuals  select  interaction  partners  on  the  basis  of  relative  similarity  and 
availability. 

2b.  Individuals  engage  in  interaction  concurrently,  thus  an  individual's  first  choice  of 
interaction  partner  may  not  be  available. 

3a.  Individuals  have  both  an  information  processing  capability  and  knowledge  which 
jointly  determine  the  individual's  behavior. 

3b.  Individuals  have  the  same  information  processing  capabilities. 

3c.  Individuals  differ  in  knowledge  as  each  individual's  knowledge  depends  on  the 
individual's  particular  socio-cultural-historical  background. 

3d.  Individuals  can  be  divided  into  types  or  classes  on  the  basis  of  extant  knowledge 
differences. 

These  assumptions  lead  to  a  simulation  template,  which  features  a  dynamic  LPM  as  the 
stochastic  engine.  We  briefly  present  Construct  in  this  fashion,  and  go  on  to  show  that  it 
performs  well  in  simulated  empirically  obtained  networks. 
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Data 


The  LPM  and  ERG  models  are  both  used  to  model  the  Sampson  (1969)  Monk  data  and 
the  Neweomb  (1961)  Fraternity  data,  two  elassieal  datasets  within  the  soeiology  literature. 
Sampson  reeorded  soeial  network  data  on  the  strength  of  “liking”  between  monks  in  a  monastery 
at  three  different  points  in  time.  Between  surveys,  four  of  the  monks  were  aetually  expelled  from 
the  monastery.  The  soeial  network  of  these  individuals  was  therefore  ehanged  over  time. 
Neweomb  provided  17  eollege  transfer  students  with  fraternity  style  housing  in  exehange  for 
their  partieipation  in  a  study  on  friendship  formation.  Every  week  they  were  required  to  rated  on 
a  seale  of  1  to  16  their  preferenee  for  others  in  the  house.  Sinee  ERG  models  require  binary  data, 
we  use  the  diehotomous  version  of  the  Neweomb  data  proposed  by  Kraekhardt  (1998),  whieh 
reeords  a  direeted  link  between  node  i  and  node  j  if  node  i  rated  node  j  as  one  of  their  top  8 
elosest  relationships  in  the  network.  There  are  15  time  periods  in  the  Neweomb  data. 


Comparing  the  Models 

The  ERG  model  and  EPM  are  investigated  for  their  strengths  and  weakness  in  modeling 
longitudinal  data  in  MeCulloh  (2008).  We  re-present  the  results  here.  For  the  Sampson  (1969) 
monk  data,  an  ERG  model  fit  by  Hunter,  et.  al.  (2008)  is  used.  An  ERG  model  is  also  fit  to  the 
Neweomb  (1961)  fraternity  data.  An  EPM  is  also  fit  to  both  the  Sampson  and  Neweomb  data 
sets.  Monte  Carlo  simulation  is  used  to  generate  instanees  of  the  Sampson  Monk  soeial  network 
and  the  Neweomb  Fraternity  soeial  network  under  the  ERG  model  and  the  LPM. 

A  distanee  measure  is  required  to  eompare  the  similarity  between  the  diehotomous  networks 
generated  using  the  ERG  model,  the  LPM,  and  the  empirieal  data.  Hamming  distanee  (1950)  is  a 
logieal  ehoiee,  sinee  it  evaluates  a  distanee  between  diehotomous  networks.  If  the  data  were 
weighted  networks  and  the  models  generated  weighted  networks  as  well,  then  a  Euelidean 
distanee  would  be  appropriate.  The  quadratie  assignment  proeedure  (QAP)  (Kraekhardt,  1987) 
eould  be  used  to  eompare  the  eorrelation  between  networks;  however,  the  eorrelation  eoeffieient 
does  not  ehange  linearly  with  network  distanee.  The  average  Hamming  distanees  from  eaeh 
empirieal  data  set  to  every  other  empirieal  data  set  and  from  eaeh  simulated  network  to  eaeh 
empirieal  data  set  were  ealeulated.  These  average  Hamming  distanees  were  then  eompared  using 
a  2-sample  t-test.  The  results  of  this  test  indieate  whether  the  LPM  or  the  ERG  model,  models 
the  empirieal  networks  with  more  or  less  error. 

Table  3  shows  the  distanee  between  the  Sampson  Monk  data  to  both  the  ERG  and  LPM. 

Table  4  shows  the  distanee  between  the  Neweomb  Fraternity  data  to  both  the  ERG  and 
LPM.  It  ean  be  seen  in  both  tables  3  and  4  that  the  p-values  are  signifieant  at  the  0.05  level. 
This  means  that  there  is  a  signifieant  differenee  between  how  well  the  ERG  and  LPM  model 
empirieal  data.  The  positive  values  for  the  test  statistie  indieate  that  the  LPM’s  average 
Hamming  distanee  is  less  than  the  average  Hamming  distanee  of  the  empirieal  data.  We  ean 
eonelude  from  this  test  that  the  LPM  does  a  signifieantly  better  job  of  modeling  empirieal  data 
than  the  ERG. 


23 


Table  3,  ERG  and  LPM  Distance  to  Empirical  Data  for  the  Sampson  Monk  Data 


Time 

period 

Mean 
Hamming 
Distance  for 
ERG  model 

ERG 

Standard 

Deviation 

Mean 
Hamming 
Distance  for 
LPM 

LPM 

Standard 

Deviation 

T-Test 

Statistic 

P-value 

1 

98.7 

5.697 

27.67 

3.5922 

39.43 

0.0006 

2 

99.1 

6.2263 

24.99 

3.5935 

37.64 

0.0007 

3 

103.7 

6.2902 

24.66 

3.5945 

39.74 

0.0006 

Table  4,  ERG  and  LPM  Distance  to  Empirical  Data  for  the  Newcomb  Fraternity  Data 


Time 

Period 

Mean 

Hamming 

Distance 

forERGM 

ERG 

Standard 

Deviation 

Mean 
Hamming 
Distance 
for  LPM 

LPM 

Standard  t-test 
Deviation 

p-value 

1 

139.7 

8.3938 

91.9 

5.1913 

18.0147 

0.0353 

2 

138.9 

8.1847 

75.1 

5.2128 

24.6573 

0.0258 

3 

137.3 

8.2872 

48.3 

5.2226 

33.9732 

0.0187 

4 

135.5 

9.3363 

49.7 

5.2340 

29.0460 

0.0219 

5 

134.1 

8.9870 

50.1 

5.2319 

29.5558 

0.0215 

6 

136.3 

8.5251 

45.5 

5.2440 

33.6983 

0.0189 

7 

133.9 

9.0609 

47.3 

5.2397 

30.2202 

0.0211 

8 

134.1 

7.2946 

51.9 

5.2591 

35.6377 

0.0179 

10 

133.7 

5.1865 

64.2 

5.2223 

42.3990 

0.0000 

11 

132.7 

6.0562 

53.4 

5.2074 

41.4119 

0.0006 

12 

136.3 

8.4466 

51.1 

5.2147 

31.8930 

0.0200 

13 

134.9 

9.0117 

46.6 

5.2311 

30.9989 

0.0205 

14 

133.9 

5.4457 

46.1 

5.2230 

50.9574 

0.0000 

15 

133.1 

5.7242 

Ml 

5.2378 

47.4518 

0.0004 

A  similar  test  was  done  to  eompare  the  Hamming  distanee  between  the  empirieal  data  at 
eaeh  time  point,  with  the  empirieal  data  at  all  other  time  points.  The  LPM  was  found  to  have  no 
more  error  than  that  present  between  different  time  points  in  the  empirieal  data.  This  provides 
evidenee  to  validate  the  LPM  as  an  effective  method  for  simulating  data. 


The  LPM  has  additional  advantages.  The  LPM  avoids  the  issues  of  model  degeneracy 
inherent  in  the  ERG  model.  The  probability  of  link  occurrence  is  based  on  the  historic  presence 
of  links  and  does  not  use  a  Markov  assumption  or  over  specify  a  statistical  model.  For  these 
reasons,  the  LPM  provides  an  alternative  method  for  modeling  and  conducting  longitudinal 
social  network  analysis.  For  our  purpose  in  this  chapter,  the  FPM’s  ability  to  replicate  empirical 
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data  makes  it  a  reasonable  stochastic  engine  for  the  Construct  multi-agent  simulation  model. 
The  multi-agent  simulation  simply  adds  additional  relational  dependence  into  a  model  that 
already  performs  well  to  make  it  more  realistic  and  capable  of  evolution  over  time. 

Applications 

The  theoretical  underpinnings  of  constructuralism  as  manifested  in  Construct  lead  us  to  a 
multi-agent  simulation  which  utilizes  a  dynamic  LPM  as  a  stochastic  engine  for  the  development 
of  knowledge  diffusion  and  relationship  building.  What  does  this  simulation  provide  the  user? 

The  simulation  provides  an  accurate,  realistic  simulation  of  social  dynamics.  We 
envision  several  ways  in  which  this  will  be  important  to  the  military  in  particular  and  the  wider 
academic  audience  in  general.  Construct  can  be  used  as  a  valuable  decision  support  tool  for 
military  commanders.  The  social  dynamics  of  terrorist  organizations,  local  culture,  or  friendly 
military  forces  can  all  be  modeled  with  the  simulation.  A  commander  can  war-game  potential 
courses  of  action,  and  evaluate  alternatives  using  Construct.  It  can  be  very  difficult  to  reason 
through  the  many  potential  interactions,  factors,  and  competing  theories.  This  simulation 
provides  a  framework  that  is  grounded  in  social  theory,  and  validated  against  empirical  evidence, 
that  can  be  used  to  evaluate  potential  courses  of  action. 

For  example,  a  commander  might  consider  detaining  one  or  more  suspected  terrorists. 
By  modeling  the  course  of  action  in  Construct,  he  can  observe  the  impacts  of  removing  the 
individual,  on  the  organization’s  performance,  situational  awareness,  and  overall  effectiveness. 
Given  limited  resources,  the  commander  could  even  use  the  simulation  to  optimize  the 
individuals  to  remove  from  the  social  group.  The  simulation  provides  the  military  analyst  the 
ability  to  predict  the  future  social  dynamics  of  an  organization.  This  is  a  powerful  combat 
multiplier  for  today’s  non-kinetic  asymmetric  war  fighter. 

The  Army  could  also  use  Construct  to  evaluate  the  organizational  structure  of  newly 
formed  doctrinal  units,  such  as  the  Future  Combat  System  (FCS)  operational  units.  The 
simulation  can  evaluate  which  personnel  communicate  more  or  less  frequently.  This  can  help 
inform  efficient  organization  of  soldiers  from  staff  organizations  to  vehicle  crews.  Focused 
research  on  social  groups  can  follow  better  experimental  design,  and  yield  greater  knowledge,  if 
an  array  of  research  questions  is  first  evaluated  in  simulation.  Social  dynamics  are  complex  and 
it  can  be  difficult  to  correctly  reason  through  different  scenarios.  Simulation  can  provide  insight 
that  may  shape  the  research  questions  to  be  more  effective. 

Finally,  the  normal  behavior  of  an  organization  can  be  simulated  many  times.  From  the 
simulations,  statistical  distributions  can  be  fit  to  various  measures  of  group  behavior.  These 
statistical  distributions  can  be  used  to  evaluate  statistical  hypotheses  or  to  detect  statistically 
significant  differences  between  observations  of  the  group  and  normal  behavior.  This  statistical 
framework,  therefore,  increases  the  relevant  findings  one  can  discover  in  socially  dynamic 
organizations. 

We  have  presented  two  models  for  describing  the  behavior  of  social  networks:  the  ERG 
model  and  the  LPM.  Both  models  were  fit  to  two  well-known  data  sets  in  the  literature,  the 
Sampson  Monk  data,  and  the  Newcomb  Fraternity  data.  The  LPM  modeled  the  data  with  a 
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statistically  significant  better  lit  than  the  ERG  model.  The  benefit  of  the  LPM  was  further 
demonstrated  by  finding  that  the  differenee  between  the  LPM  fit  and  the  empirical  data,  was  no 
larger  than  the  average  differenee  between  any  two  samples  of  the  empirieal  data. 

The  key  limitation  of  the  LPM  is  that  it  does  not  aecount  for  all  of  the  relational 
dependenee  that  is  known  to  exist  in  socially  connected  groups.  The  multi-agent  simulation 
Construct  eonveniently  overcomes  this  limitation.  Construct  essentially  uses  the  LPM  as  its 
stoehastie  engine.  The  link  probabilities  at  eaeh  time  step  are  affected  by  eonstrueturalist  theory 
established  in  the  literature.  Laetors  sueh  as  perceived  homophily,  shared  knowledge,  proximity, 
and  socio-demographic  variables  all  affeet  the  link  probabilities  at  eaeh  time  period.  These 
factors  introduce  relational  dependenee  into  the  LPM.  The  relative  weighting  that  these  faetors 
have  can  be  adjusted  by  the  user.  This  ereates  a  flexible  simulation  tool,  grounded  in  empirieal 
evidenee  and  sociologieal  theory. 

While  Construct  may  be  a  powerful  simulation  tool,  the  eurrent  user  interface  limits  its’ 
capability.  The  Organizational  Risk  Analyzer  (ORA)  is  a  software  paekage  maintained  by  the 
Center  for  Computational  Analysis  of  Soeial  and  Organizational  Systems  (CASOS)  at  Carnegie 
Mellon  University.  ORA  has  an  interfaee  for  near-term  impact,  which  allows  the  user  to  isolate 
certain  agents  in  a  soeially  networked  group  and  evaluate  the  impact  of  the  isolations  through 
simulation  using  Construct.  Other  than  this  interfaee,  simulation  runs  must  be  condueted  using 
an  xml  seript.  Luture  research  will  hopefully  provide  funding  to  better  develop  the  user  interface 
for  the  simulation.  An  improved  user  interfaee  might  make  Construct  available  to  a  division 
ORSA  to  better  evaluate  various  eourses  of  aetion.  This  improved  ability  to  war-game  various 
scenarios  may  enhanee  the  effectiveness  of  those  military  units. 
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INTERFACING  NETWORK  SIMULATIONS  WITH  EMPIRICAL  DATA 


ACTOR  ORIENTED  SOCIAL  NETWORK  SPECIFICATION  AND  ESTIMATION 

Multi-agent  simulation  is  rapidly  emerging  as  a  popular  tool  for  understanding  eomplex 
soeial  and  organizational  struetures.  Historieally,  these  models  have  been  either  very  simple,  or 
have  eontained  few  agents  due  to  issues  of  eomputational  eomplexity.  As  the  power  of 
eomputers  eontinues  to  inerease  rapidly,  more  eomplex  multi-agent  simulation  models  are 
needed.  Soeial  network  analysis  has  beeome  equally  popular  for  understanding  soeial  and 
organizational  structures.  This  chapter  applies  methods  in  longitudinal  social  network  analysis  to 
multi-agent  simulation. 

Human  organizations  and  social  groups  are  composed  of  individuals.  The  individuals  can 
be  related  in  a  number  of  different  ways:  friendship,  trust,  ethnicity,  shared  ideology,  shared 
goals,  and  more.  Some  of  these  relationships  are  important  in  understanding  the  behavior  and 
actions  of  the  organization  or  social  group.  Other  relationships  are  unimportant.  Furthermore, 
some  relationships  affect  others,  creating  very  complex  dynamic  behavior.  Multi-agent 
simulation  is  used  to  model  individual  agents  that  can  act,  interact,  and  learn.  The  agents  exist  in 
an  environment  where  their  interaction  is  constrained  by  their  position  in  various  social  networks 
defined  by  the  aforementioned  relationships  among  others.  Group  behavior  emerges  as  a  result  of 
the  complex  interaction  between  agents. 

Understanding  network  structure  is  very  important  for  modeling  social  groups  and 
organizations  in  a  realistic  manner.  For  example,  Valente  (2007)  was  interested  in  modeling  the 
diffusion  of  contraceptive  innovations  in  the  Cameroon.  He  found  that  real-world  adoption  rates 
did  not  follow  simulation  models  when  the  network  relationships  were  ignored.  An  individual’s 
decision  to  adopt  an  innovation  is  highly  dependent  on  the  decisions  of  adjacent  individuals  in  a 
social  network.  Assumptions  of  random  mixing  of  individuals,  therefore,  generate  inaccurate 
adoption  rates  since  trust  and  friendship  networks  are  important  factors.  When  the  simulation 
accurately  models  the  underlying  social  networks  of  people  in  the  Cameroon,  more  accurate 
diffusion  models  are  obtained.  For  a  more  thorough  review  of  the  diffusion  of  innovations,  see 
Valente  (2007). 

Understanding  social  networks  is  not  only  important  for  modeling  diffusion  processes. 
Social  networks  are  important  for  modeling  any  social  group  or  organization  involving  humans. 
Multi-agent  simulation  modelers  should  be  familiar  with  important  theories  in  social  network 
analysis  that  govern  relationships  between  individual  agents.  Incorporating  some  of  these 
theories  into  simulation  models  will  contribute  to  more  realistic  models. 

It  is  also  important  to  be  able  to  identify  what  social  theories  are  applicable  to  certain  problems 
and  situations.  Relationships  that  may  be  important  in  one  context  may  be  unimportant  in 
another.  Social  network  analysts  are  able  to  statistically  test  for  the  significance  of  various  social 
theories  in  longitudinal  network  data.  Equipped  with  significant  theories  governing  network 
formation  in  empirical  data,  the  multi-agent  simulation  modeler  can  include  these  factors  in  their 
simulation,  thereby  creating  more  realistic  agent  interactions. 


27 


This  chapter  will  present  a  novel  approach  to  multi-agent  simulation  and  demonstrate  it 
on  a  real-world  network  data  set.  Longitudinal  network  data  is  eolleeted  in  a  natural  experiment 
foeused  on  studying  shared  situational  awareness  and  eommunieation.  An  aetor-oriented  model 
(Snijders,  2007)  is  fit  to  the  data  to  determine  signilieant  soeial  theories  contributing  to  network 
dynamics.  These  theories  ean  then  be  incorporated  in  a  multi-agent  simulation  model  to  ereate 
more  aeeurate  organizational  behavior. 

The  chapter  is  organized  as  follows.  First,  we  deseribe  a  theory  of  network  dynamies 
used  in  soeial  network  analysis.  Next,  we  deseribe  the  eoneept  of  network  utility.  In  Section  4  we 
deseribe  network  data  eolleeted  from  a  natural  experiment  eondueted  at  the  U.S.  Military 
Aeademy.  Seetion  5  deseribes  a  longitudinal  analysis  of  that  data,  with  the  results  presented  in 
Section  6.  In  Section  7,  we  highlight  implieations  for  multi-agent  simulation  modelers  and 
provide  direetions  for  future  work. 


Network  Dynamics 

Network  dynamies  is  a  term  used  in  soeial  network  analysis  to  deseribe  the  behavior  of 
networks  over  time  (Doreian  &  Stokman,  1997).  Soeial  network  analysts  have  been  eondueting 
researeh  in  this  area  for  quite  some  time  (Sampson,  1969;  Romney  1989,  Sanil  et.  al.  1995; 
Snijders,  1990;  Frank  1991).  There  are  four  behaviors  that  ean  occur  in  a  network  over  time: 
Stability,  Evolution,  Random  Change,  and  Mutation  (McCulloh  &  Lospinoso  2007;  Johnson  et. 
al,  2003). 

Network  Stability  oceurs  when  the  underlying  relationships  that  eonneet  agents  in  a 
network  remain  the  same  over  time.  The  observed  data  may  eontain  error.  Some  relationships 
may  not  be  observed,  while  some  observed  eonnections  may  be  inadvertent  and  no  relationship 
exists.  Consider  email  eommunieation:  an  agent  may  eommunicate  with  some  friends  every  day, 
others  sporadieally,  and  they  may  even  aeeidentally  email  someone  they  do  not  know  by  hitting 
the  wrong  name  in  a  distribution  list  or  replying  to  all  in  an  email.  While  the  observed  networks 
may  fluctuate  from  day  to  day,  the  underlying  relationships  remain  unchanged.  They  have 
reached  a  dynamic  equilibrium  for  at  least  the  short  term. 

Network  Evolution  oeeurs  when  agent  interaetion  over  time  ehanges  the  underlying 
relationships.  Furthermore,  evolution  assumes  that  there  is  some  underlying  stochastie  proeess 
that  causes  ehange  over  time.  There  are  two  leading  approaches  for  modeling  network  evolution. 
One  general  elass  of  approaeh  is  to  use  Markov  ehains  (Wasserman,  1977,  1978,  1980).  Under 
this  approaeh,  the  network  transitions  from  one  network  state  to  the  next  over  time.  The  future 
state  of  the  network  is  eonditioned  only  on  the  eurrent  time  step  and  not  previous  time  steps. 
Research  has  focussed  on  the  structure  of  the  transition  matrix  that  governs  the  evolution  of  the 
networks. 

An  alternate  approach  for  modeling  network  evolution  is  multi-agent  simulation  (Dorean 
1983;  Carley  1991,  1999).  Under  this  approaeh,  agent  based  models  are  ereated  in  whieh  agents 
internet  aeeording  to  some  established  soeial  theory.  Interaetions  allow  the  agents  to  ehange  in 
some  important  way  that  may  affeet  future  interaetion. 
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Random  Change  in  a  network  occurs  when  the  future  behavior  of  the  network  is 
independent  of  the  current  state  (McCulloh  2009).  In  other  words,  the  agent  interaction  is 
affected  by  something  external  to  the  network.  For  example,  an  Army  platoon  may  evolve  as 
individual  agents  interact  and  communicate.  When  that  same  Army  platoon  comes  under  attack 
by  the  enemy,  there  is  something  fundamentally  different  about  their  relationships.  There  is  not 
anything  inherent  in  the  individual  agent  interactions  that  could  have  predicted  the  change  in 
network  behavior  as  a  result  of  the  enemy  attack. 

It  is  also  possible  that  a  random  change  could  initiate  network  evolution.  We  call  this 
type  of  behavior  a  Mutation.  In  our  Army  example,  it  is  possible  that  under  the  stress  of  enemy 
combat  an  individual  agent  displays  remarkable  courage  or  cowardice.  This  individual  behavior 
may  improve  or  remove  the  status  of  an  agent.  Other  agents  in  the  network  may  respond 
differently  to  agent  based  on  their  actions  during  the  random  change. 

One  possible  explanation  of  network  dynamics  is  agent-driven  optimization.  Agents  in  a 
network  attempt  to  optimize  their  utility  subject  to  various  costs  and  constraints.  Under  this 
concept,  stability  can  be  viewed  as  an  equilibrium  surrounding  some  local  optima.  Evolution  can 
be  viewed  as  the  network  converging  on  some  new  dynamic  equilibrium.  Random  change  is  still 
exogenous  to  the  network  and  changes  the  state  of  agents  in  the  network.  If  this  change  results  in 
some  other  local  optima,  then  the  network  reaches  some  new  stability  states.  Otherwise,  the 
network  experiences  mutation  as  the  network  converges  to  a  new  equilibrium.  This  concept  of 
agent-driven  optimization  is  further  explored  in  this  chapter  as  an  approach  for  modeling 
complex  adaptive  social  systems. 


Network  Utility 

The  concept  of  actor-driven  models  for  network  evolution  was  proposed  by  Snijders 
(1996).  Several  applications  of  this  model  have  been  presented.  Snijders’  concept  of  actor-driven 
models  views  a  network  from  the  perspective  of  individual  agents.  Each  agent  can  control  the  set 
of  outgoing  links  to  other  agents  in  the  network.  His  seminal  assumption  is  that  actors  perform 
myopic  stochastic  optimization  in  continuous  time.  These  changes  are  Markovian  and  depend  on 
network  structure,  attributes,  and  observed  covariates. 

Social  network  analysts  use  Snijders’  actor-driven  model  to  determine  what  pre-defined 
social  factors  are  important  in  describing  the  evolution  of  empirical  social  network  data.  Snijders 
(2002)  defines  1 1  basic  potential  objective  functions  that  have  some  sociological  meaning: 

1 .  The  density  effect  is  defined  by  the  number  of  links  an  agent  has  to  other  agents  in  the 
network. 

2.  The  reciprocity  effect  is  defined  by  the  number  of  links  to  other  agents  that  are 
reciprocated,  in  that  when  an  agent  links  to  a  target  agent,  that  target  also  links  back  to  the 
original  agent. 

3.  The  transitivity  effect  is  defined  by  the  number  of  transitive  patterns  among  an  agent’s 
connections.  A  transitive  pattern  occurs  when  two  of  an  agent’s  connections  are  connected 
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themselves.  This  is  also  known  as  a  transitive  triplet.  Transitivity  follows  the  logie  that  two 
agents  are  more  likely  to  know  eaeh  other  if  they  have  a  eommon  friend. 

4.  The  balance  effect  is  defined  by  the  similarity  of  outgoing  links  between  an  agent’s 
eonneetions.  This  theory  is  driven  by  the  idea  that  there  are  positive  and  negative  links  and 
an  agent  is  uneomfortable  having  both  relations  simultaneously.  In  other  words  the  enemy 
of  my  friend  should  be  my  enemy  and  the  friend  of  my  friend  should  be  my  friend.  If  I  am 
friends  with  my  enemy’s  friend,  I  will  feel  uneomfortable.  This  effeet  is  highly  eorrelated 
with  the  density  effeet  and  transitivity  effeet.  If  both  are  ineluded  in  a  model  a  eorreetion 
for  the  eorrelation  between  effeets  should  be  ineluded. 

5.  The  number  of  geodesic  distances  of  two  effect  is  defined  by  the  number  of  other  agents 
that  an  agent  is  indireetly  eonneeted  to  through  an  intermediary  agent. 

6.  The  popularity  effect  is  defined  as  the  number  of  links  an  agent  has  eoming  from  other 
agents  in  the  network. 

7.  The  activity  effect  is  defined  as  the  number  of  other  agents  that  ean  be  reaehed  by  an  agent 
in  two  steps. 

8.  The  main  link  effect  is  a  eovariate  effeet  for  links  in  the  network.  The  other  objeetive 
funetions  might  be  weighted  by  eertain  relationships.  For  example,  a  link  to  an  agent  of 
high  prestige  or  rank  might  be  more  valuable  than  a  link  to  an  agent  with  equivalent  status. 

9.  The  related  popularity  effect  is  a  eovariate  effeet  for  agents  in  the  network.  This  is  defined 
for  an  agent,  i,  as  the  sum  of  the  popularity  effeet  of  all  other  agents  eonneeted  to  agent  i. 

10.  The  related  activity  effect  is  a  eovariate  effeet  for  agents  in  the  network.  This  is  defined  for 
an  agent,  i,  as  the  sum  of  the  aetivity  effeet  of  all  other  agents  eonneeted  to  agent  i. 

1 1 .  The  related  dissimilarity  effect  is  a  eovariate  effeet  for  agents  in  the  network.  This  is 
defined  as  the  sum  of  the  differenees  in  some  important  attribute  between  an  agent  and  its’ 
direet  eonneetions. 

Agents  in  a  network  ean  also  experienee  eonstraints  as  well  as  have  objeetives.  Agents 
ean  be  eonstrained  in  the  number  of  links  that  they  ean  maintain  to  other  agents  in  the  network. 
This  eonstraint  models  eognitive  limitations  on  individuals.  A  person  is  not  eapable  of 
maintaining  meaningful  relationships  with  hundreds  of  people.  Other  eonstraints  may  be 
imposed  on  the  agents  in  the  network.  Snijders  does  not  eonsider  eonstraints  in  his  model  to 
simplify  eomputation.  When  estimating  the  effeets,  the  density  effeet  often  has  a  negative 
eoeffieient.  This  is  interpreted  as  an  observed  eonstraint  on  node  degree.  See  Snijders  (2002)  for 
a  more  thorough  explanation.  Our  aim  is  to  present  eonsiderations  in  multi-agent  simulation 
based  on  soeial  network  analysis  and  not  to  generate  a  eomprehensive  model. 

Under  a  network  utility  model,  an  agent  will  ehange  its  outgoing  links  in  sueh  a  way  as  to 
inerease  its  overall  utility,  whieh  is  equivalent  to  optimizing  its  objeetive  funetion  (utility).  It  is 
important  to  note  that  the  list  of  objeetive  funetions  are  suggestions  and  are  non-exhaustive. 
When  tested  against  empirieal  data,  only  a  subset  of  the  objeetive  funetions  may  be  found  to  be 
signifieant.  Undoubtedly,  an  analyst  eould  eonsider  other  important  soeial  faetors.  Therefore, 
when  using  these  objeetive  funetions  in  a  multi-agent  simulation,  the  modeler  should  use  some 
intuition  in  determining  important  effeets.  Ideally,  a  modeler  eould  reeord  empirieal  data,  use 
Snijders’  aetor-driven  approaeh  to  determine  signifieant  objeetive  funetion  effeets  as  his 
approaeh  was  intended,  and  then  use  those  effeets  in  a  multi-agent  simulation  to  make  inferenee 
on  the  future  behavior  of  the  network. 
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It  is  important  to  point  out  differences  between  network  utility  and  classic  game  theory. 
Common  applications  of  game  theory  intend  to  focus  on  trading  scarce  resources.  The  network 
utility  approach  does  not  consider  the  transfer  of  resources,  rather  agents  attempt  to  optimize 
their  position  in  their  social  network.  This  approach  may  not  be  common  in  multi-agent 
simulation,  but  it  is  supported  in  the  social  sciences. 


Data 


Parity  Communications  in  collaboration  with  the  Higgins  Trust  Framework  and  the 
SocialPhysics  project  constructed  the  ELICIT  software  package.  Installed  on  client  computers, 
the  software  serves  as  the  platform  for  studying  organizational  efficiency  and  effectiveness.  The 
four  phase  experiment  entails  an  introduction,  practice  round,  a  one  hour  exercise,  and  a  wrap 
up.  During  both  the  practice  round  and  the  actual  exercise,  thirty  four  subjects  are  randomly 
assigned  to  one  of  two  organizations;  a  typical  hierarchically  arrayed  organization  (C2)  and  a 
control-free,  self-organizing  organization  (E).  These  two  organizations  operate  independently  for 
the  duration  of  the  exercises.  See  Lospinoso  (2007)  for  more  information  on  the  experiment  and 
basic  descriptive  statistics  of  the  data. 

The  goal  of  the  organization  is  to  identify  a  terrorist  attack  based  on  bits  of  information 
distributed  around  the  organization.  After  ten  minutes  of  the  one  hour  experiment,  all  of  the 
correct  information  has  been  issued  to  the  organization.  Among  the  correct  bits  of  information, 
or  factoids,  are  also  distributed  false  factoids.  Each  agent  receives  four  factoids,  and  they  must 
collaborate  within  the  organization  to  come  up  with  the  correct  arrangement  of  who,  what, 
where,  and  when  of  the  terrorist  attack.  The  C2  group  is  comprised  of  a  squad  leader,  four  team 
leaders,  and  twelve  team  members.  Communications  among  these  agents  are  restricted  to  the 
following  graph  in  Eigure  6; 


Figure  6,  C2  Communications  Hierarchy 

Each  team  is  dedicated  to  identifying  one  key  element  of  the  terrorist  attack:  who,  what,  where, 
and  when. 
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The  E  group  is  comprised  of  seventeen  agents  with  full  communication  capability  across 
the  organization.  There  are  no  defined  teams,  but  the  goal  remains  the  same;  positively  identify 
the  terrorist  attack.  All  agents  have  the  ability  to  post  their  information  on  their  organization’s 
website.  Within  the  E  group,  this  website  is  global  to  the  organization.  The  C2  group  has 
separate  websites  for  each  echelon  (four  teams  and  one  squad  site).  The  hierarchy  in  Eigure  1 
describes  where  each  agent  can  post  information.  Agents  can  also  share  information  with  other 
individual  agents.  Once  an  agent  believes  that  it  knows  any  number  of  correct  factoids,  it  can 
report  its  belief  through  the  “identify”  function  to  its  immediate  superior  in  the  C2  group  or  to 
the  entire  network  in  the  E  group. 

Data  was  collected  on  two  iterations  of  EEICIT  experiments  conducted  at  West  Point. 
During  one  iteration,  the  cadets  were  allowed  to  communicate  within  an  edge-network 
configuration.  In  the  other,  the  cadets  were  required  to  adhere  to  a  strict  hierarchy.  Other  than 
these  systemic  restrictions,  the  two  iterations  were  run  identically  for  an  actual  test  run  of  two 
hours. 


The  participants  in  this  experiment  were  all  cadets  at  the  El.S.  Military  Academy  between 
the  ages  of  17  and  23.  The  experiment  was  approved  for  ethics  and  safety  by  the  West  Point 
Institutional  Review  Board.  All  participants  received  a  briefing  on  the  experiment,  consented  to 
participate,  and  had  the  option  to  leave  the  experiment  at  any  time  without  any  adverse  impacts. 
The  investigators  conducting  the  experiment  were  not  in  the  participants’  military  chain  of 
command,  so  no  undue  influence  was  exerted  in  this  experiment. 


Method 

We  use  the  social  network  software  package  SIENA  (Snijders  et.  ah,  2007)  which 
implements  an  actor-oriented  network  model  to  analyze  data  from  two  iterations  of  the  EEICIT 
experiment.  Adjacency  matrices  were  constructed  to  reflect  the  structure  of  communication 
networks  over  time.  These  are  unweighted  (dichotomous),  directed,  and  non-reflexive  square 
matrices.  We  must  define  time  intervals  in  which  to  discretize  or  bin  the  data.  Eollowing  the 
guidelines  set  out  by  Steglich  and  Snijders  (2007),  we  chose  five  bins.  Each  edge  Cijt  was 
assigned  a  positive  value  (of  one)  if  one  of  two  conditions  was  met:  cadet  i  sent  cadet  j 
information  during  time  bin  t,  or  cadet  i  posted  information  on  a  team  website  sometime  between 
the  start  of  the  experiment  and  time  t  which  cadet  j  retrieved  during  time  t. 

Next,  we  defined  covariates.  This  step  is  crucial  and  warrants  special  attention  when 
conducting  an  actor-oriented  model  specification  under  the  SIENA  framework.  Covariates  are 
empirically  derived  values  which  are  infused  directly  into  four  main  objective  functions  (effects 
8-11)  and  provide  compelling  parameter  estimates  which  can  potentially  gain  critical  insight  into 
important  aspects  of  sociological  systems.  In  the  case  of  the  EEICIT  data,  we  identify  two  main 
link  effect  covariates  corresponding  to  leadership  and  location.  The  leadership-link  effect  is 
modeled  with  dependence-style  network.  The  leadership  network  consists  of  time-invariant 
relationships  of  who  was  in  charge  of  whom.  Note  that  the  leadership  network  was  completely 
empty  for  the  edge-organization  case,  because  there  were  no  formally  defined  leadership  roles. 
The  statistically  significant  parameter  estimates  of  the  leadership-link  effect  indicate  that  formal 
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leadership  roles  may  play  a  signifieant  part  in  driving  agent  behavior.  With  low— or  even 
negative— parameter  estimates,  agents  in  the  network  are  averse  to  forming  links  with  formal 
leaders.  The  location-link  effect  models  geographieal  proximity.  Within  the  ELICIT  framework, 
geographie  distanee  may  play  a  signifieant  role  within  the  hierarehieal  network,  since 
geographical  locations  coincide  with  team  placements.  It  would  seem  to  also  be  an  important 
covariate  for  the  agents  in  the  edge  network,  since  agents  within  the  same  geographical  region 
post  to  the  same  website  and  are  most  likely  to  gain  information  from  this  site.  The  statistically 
significant  parameter  estimates  of  the  location-link  ejfect  indicate  a  strong  affinity  or  aversion 
across  both  the  edge  and  hierarchical  networks  on  the  basis  of  team  cohesion  (whether  enforced 
or  not). 


In  addition  to  main  link  effect  covariates  defined  on  relationships  between  agents,  we 
also  defined  a  covariate  for  the  information  an  agent  possesses.  As  time  progresses  in  the 
experiment,  agents  gain  bits  of  information.  Once  an  agent  believes  that  the  information  is  true, 
they  will  privately  publish  their  belief  to  the  ELICIT  server,  where  the  belief  can  be  recorded  by 
the  experiment  administrators.  This  is  a  time  varying  effect.  We  use  the  related  popularity  effect 
(number  9)  to  model  this  effect.  Statistically  significant  parameter  estimates  of  the  information 
effect  indicate  that  agents  with  more  information  attract  more  communication  from  other  agents 
in  the  network. 

We  also  modeled  the  density  effect,  the  reciprocity  effect,  and  the  transitivity  effect 
(effects  1-3),  because  they  are  commonly  used  in  the  literature.  We  elected  to  omit  other 
objective  functions  to  prevent  over  specification  of  the  model.  See  Steglich  and  Snijders  (2006) 
for  a  more  comprehensive  review. 


Results 


To  estimate  the  parameters  of  both  the  edge  and  hierarchical  treatments  simultaneously, 
we  compiled  both  adjacency  matrices  and  covariates  into  large  matrices  with  structural  holes 
where  appropriate.  We  conducted  estimation  procedures  within  SIENA  using  default  parameters 
and  1000  iterations  of  the  three-stage  Metropolis-Hastings  Markov  Chain  Monte  Carlso.  Table  5 
and  Table  6  display  the  parameter  estimates  of  the  E  and  C2  networks  respectively. 


Table  5,  Parameter  Estimates  for  Edge  Network 


Measure 

Parameter  Estimate  (p-val) 

Density  Effect 

-.3693  (.028) 

Transitivity  Effect 

.2054  (.031) 

Reciprocity  Effect 

.1502  (070) 

Location-link  Effect 

.0513  (.471) 

Leadership-link  Effect 

— 

Information  Effect 

.2146  (.009) 
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Table  6.  Parameter  Estimates  for  Hierarchical  Network 


Measure 

Parameter  Estimate  (p-val) 

Density  Effect 

-.9976  (.035) 

Transitivity  Effect 

.2007  (044) 

Reciprocity  Effect 

.0640  (36) 

Eocation-link  Effect 

.2632  (017) 

Eeadership-link  Effect 

.1507  (023) 

Information  Effect 

-.1647  (019) 

We  estimate  six  important  objeetive  fimetions  to  determine  what  sort  of  utility  profiles 
are  reeurrent  in  eaeh  of  the  networks.  After  separating  out  the  effeets  of  eaeh  of  the  networks 
using  individual  eovariate  dummy  variables,  we  find  that  the  density  effeet  measure  is  negative 
and  statistieally  signifieant,  which  corresponds  with  our  intuition  that  there  is  some  sort  of 
underlying  cost  to  adding  edges.  Within  the  edge  network,  this  effect  is  significantly  diminished, 
which  may  indicate  that  agents  in  the  edge  network  either  have  more  cognitive  capacity  to  form 
ties  or  that  they  are  empowered  by  a  lack  of  formal  hierarchical  structure.  We  find  that  the 
magnitude  of  this  estimate  (nearly  -1)  compared  to  the  relative  size  of  the  other  objective 
functions  indicates  that  there  are  strong  limitations  to  the  cognitive  capacity  of  the  agents  within 
the  hierarchical  network. 

Transitivity  effect  has  a  strong  and  statistically  significant,  positive  parameter  estimate. 
Agents  in  both  of  these  networks  tend  to  close  triads,  which  would  confirm  our  intuition  in  the 
hierarchical  network,  where  team  members  might  be  expected  to  close  triads  within  their  teams. ^ 
The  estimates  are  rather  stable  across  the  edge/hierarchical  treatment,  and  it  would  appear  that 
there  is  little  difference  between  the  two  utility  profiles. 

Reciprocity  effect  has  little  effect  within  the  hierarchical  network,  but  it  has  a  significant 
effect  on  the  edge  network.  Reciprocity  tells  us  how  likely  one  node  is  to  return  information  to 
the  entity  who  sent  them  information.  This  supports  our  intuition  that  in  an  edge  network, 
relationships  are  created  on  the  basis  of  information  necessity  and  all  agents  must  cross-load 
information.  Within  the  hierarchical  network,  team-leaders  can  ask  for  information  and  receive 
information  without  ever  having  to  inform  their  teams  what  is  going  on;  so  the  edges  are  not 
reciprocated  (which  is  why  we  fail  to  have  statistically  significant  results  under  the  hierarchical 
network). 

Location-link  effect  has  a  statistically  significant  effect  on  the  parameters  for  the 
hierarchical  network.  This  may  be  a  result  of  location  and  team  membership  being  highly 
correlated.  When  two  agents  in  the  hierarchical  network  are  within  a  team,  their  team  leader 
tasks  them  with  determining  one  of  the  factoids,  so  it  is  natural  that  collaboration  here  should 
become  important.  Within  the  edge  network,  there  is  no  statistically  significant  estimate  for 
location.  What  this  indicates  is  that  within  the  edge  network,  covariates  of  initial  team 


2  Closing  triads  refers  to  the  act  of  forming  a  relationship  with  a  friend  of  a  friend. 


34 


membership  mean  little  and  agents  quiekly  breakout  of  their  loeation  to  eonneet  with  the  other 
locations  and  help  contribute  to  their  knowledge  base. 

Leadership-link  effect  was  estimated  for  the  hierarchical  network  and  had  a  strong, 
positive  estimate.  This  indicates  that  the  leadership  role  could  explain  a  large  portion  of  variation 
in  the  communication  patterns  of  the  hierarchy.  It  both  supports  our  intuition  and  supports  the 
notion  that  leadership  within  the  hierarchy  was  effective  at  promoting  information  sharing  up 
and  down  the  chain. 

Information  effect  parameter  estimates  differed  considerably  between  the  edge  and 
hierarchical  treatments.  Within  the  hierarchical  network,  there  was  actually  a  strong,  negative 
correlation  between  people  who  had  assembled  information  into  some  sort  of  conclusion  and 
others.  This  means  that  there  is  information  hoarding  going  on  in  the  hierarchical  network;  the 
leadership  is  hoarding  the  information.  Within  the  edge  network,  people  who  have  assembled 
information  seem  to  attract  many  edges.  We  cannot  establish  causality  directly  from  this  estimate 
(i.e.  it  could  be  that  the  entity  has  information  because  he  is  highly  interconnected,  or  that  he  is 
interconnected  because  he  has  information),  but  it  is  certain  that  information  sharing  within  the 
network  is  a  largely  significant  behavioral  engine. 

There  are  some  striking  differences  about  the  behavior  of  these  two  networks.  First, 
information  sharing  and  collaboration  occurs  much  more  within  the  edge  network,  while 
leadership  seems  to  drive  much  of  the  behavior  in  the  hierarchical  network.  Agents  in  the  edge 
network  tended  develop  sharing  relationships  much  more  than  in  the  hierarchical  network  as 
evidenced  by  the  high  reciprocity  and  triad  closure  in  the  edge  network.  Finally,  it  appears  that 
edge  network  agents  had  fewer  constraints  on  collaboration  en  masse  as  indicated  by  the 
magnitude  of  their  density  effect  estimates. 


Discussion 

Defense  agencies  of  the  future  will  increasingly  rely  on  an  understanding  of  complex 
systems.  From  understanding  the  asymmetric  nature  (non-hierarchical)  of  armed  adversaries  to 
engineering  net-centric  systems  that  maximize  efficiency  and  effectiveness,  researchers  have  and 
will  continue  to  benefit  from  empirical  studies  of  complex  systems— whether  social,  physical,  or 
biological.  For  a  thorough  review  on  this  active  area  of  research,  the  reader  is  referred  to  Alberts 
(2002). 


We  utilized  an  actor-oriented  specification  of  a  complex  social  system  as  opposed  to  an 
aggregated,  holistic  assessment  of  the  system,  and  as  a  result  we  were  able  to  dig  into  the 
underlying  behavioral  mechanics  of  the  network  and  truly  understand  what  is  driving  the 
autonomous,  intelligent  behavior  of  the  cadets  in  the  study.  We  now  understand  that  soldiers 
within  net-centric  edge  networks  do  collaborate  across  geographic  and  formal  boundaries  as 
expected,  but  more  importantly— their  behavior  is  driven  by  the  need  to  accumulate  knowledge 
and  settle  into  comfortable  social  patterns  (like  triad  consensus,  reciprocity,  etc.). 
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Beyond  contributing  to  sociological  literature  and  the  defense  industry's  understanding  of 
net-centric  operations  and  systems,  this  chapter  has  introduced  actor-oriented  models  in  social 
network  analysis  which  identify  statistically  significant  utility  seeking  behavior  within  empirical 
data.  The  study  of  complex,  adaptive  systems  can  benefit  from  this  empirical  framework  by 
permitting  the  investigator  a  deep  look  into  the  underlying  mechanics  that  drive  network 
structure.  Enabled  with  these  tools,  there  is  a  considerable  array  of  future  directions  that 
investigators  can  pursue  to  enrich  our  understanding  of  complex  systems. 

Parameter  estimates  from  an  actor-oriented  specification  as  outlined  in  this  chapter  can  be 
used  to  drive  a  multi-agent  simulation.  Moreover,  the  approach  laid  out  in  this  chapter  allows  a 
modeler  to  use  empirical  data  to  determine  factors  driving  agent  interaction  within  a  simulation. 
Building  simulation  based  on  statistically  significant  findings  within  empirical  data  is  an 
important  aspect  of  model  verification. 

This  approach  requires  that  multi-agent  simulation  frameworks  are  capable  of  modeling 
significant  utility  seeking  behavior.  It  is  important  to  note  that  functions  driving  agent  behavior 
may  differ  among  differing  applications.  In  the  ELICIT  example,  different  objective  functions 
were  significant  for  the  edge  and  hierarchical  networks,  even  given  highly  homogeneous  sets  of 
agents.  This  implies  that  there  is  no  one  model  that  fits  all  applications. 

An  example  of  a  flexible  multi-agent  simulation  is  Construct,  which  is  the  multi-agent 
simulation  presented  earlier.  In  the  context  of  Actor  Oriented  Models,  Construct  models  agent 
interaction  by  assigning  probabilities  of  link  formation  between  agents  at  each  time  step.  The 
probability  of  link  formation  is  determined  by  a  weighted  function  of  homophily,  socio¬ 
demographics,  and  proximity.  Throughout  the  simulation,  agents  interact,  share  knowledge,  and 
change  in  various  attributes  as  a  result  of  interaction  with  other  agents.  Within  the  framework 
laid  out  in  this  chapter,  homophily  is  equivalent  to  transitivity,  reciprocity,  balance,  and  the 
information  effect.  Socio-demographics  are  equivalent  to  the  number  of  geodesics  of  two  effect, 
the  popularity  effect  and  the  activity  effect  as  well  as  some  covariate  effects.  The  proximity  is 
equivalent  to  a  main-link  effect.  Other  effects  can  be  incorporated  into  the  Construct  model  as 
well.  While  a  detailed  explanation  of  Construct  is  beyond  the  scope  of  this  chapter,  we  point  out 
that  it  is  an  example  of  a  multi-agent  simulation  framework  that  can  be  used  to  simulate 
empirically  observed  network  data.  The  statistically  significant  parameter  estimates  of  the  actor- 
oriented  model  can  be  used  to  provide  weights  to  the  functions  that  determine  the  probability  of 
link  formation  between  agents.  In  this  manner,  the  predictive  power  of  the  multi-agent 
simulation  is  enhanced  due  to  its  similarity  to  empirical  data.  Euture  work  should  explore  the 
ramifications  of  resolving  utility  profiles  into  probability  profiles. 

An  empirically  grounded  multi-agent  simulation  also  contributes  to  better  understanding 
network  dynamics.  This  chapter  serves  to  unify  competing  approaches  to  modeling  network 
evolution.  Euture  work  may  explore  opportunities  to  introduce  random  change  into  the  simulated 
networks.  Realistic  simulation  of  networks  allows  investigators  to  explore  network  dynamics  by 
introducing  various  forms  of  evolutionary  and  random  change  at  known  points  in  time  and 
observing  their  behavior.  This  is  necessary  for  exploring  networks  over  time. 

The  approach  presented  in  this  chapter  is  still  limited  in  several  ways.  The  list  of 
objective  function  effects  outlined  in  Section  3  is  not  exhaustive.  There  are  likely  other  important 
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utility  seeking  funetions  governing  agent  interaetion.  Some  effeets  are  highly  eorrelated  and 
ineluding  too  many  effeets  may  lead  to  over  speeified  or  degenerate  models.  Future  work  may 
investigate  additional  objeetive  funetions  for  aetor-oriented  models. 

Multi-agent  system  researehers  should  be  motivated  to  apply  an  aetor-oriented  approaeh 
to  empirieal  network  data.  The  determination  of  statistieally  signifieant  utility  seeking  behavior 
in  networks  offers  us  a  deep,  eomplexity-preserving  insight  into  the  underlying  behavior  of 
soeial  systems.  Whether  the  information  is  used  at  faee  value  to  draw  inferenee  on  soeiologieal, 
physieal,  and  biologieal  phenomena,  or  utilized  as  an  intermediary  to  simulation  analysis, 
empirieal  analysis  of  the  utility  seeking  behavior  eharaeterizing  eomplex  networks  around  us 
promises  to  deepen  our  understanding  of  them. 
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INTERFACING  NETWORK  SIMULATIONS  WITH  EMPIRICAL  DATA 


CONCLUSION 

This  paper  has  presented  various  models  of  soeial  networks  as  longitudinally  observed 
phenomena  ineluding  the  Link  Probability  Model,  the  Aetor  Oriented  Model,  and  the 
Exponential  Random  Graph  Model.  Along  the  way,  statistieal  methods  were  developed  to 
differentiate  among  network  models  to  determine  aecuraey  of  the  models.  After  some  analysis 
against  empirieal  data  from  both  elassieal  literature  and  studies  eondueted  at  the  US  Military 
Aeademy,  it  was  determined  that  the  LPM  introduees  results  with  less  differenee  from  empirieal 
data  than  the  ERGM  in  these  cireumstances.  We  further  eondueted  experimental  studies  using 
the  ELICIT  framework  to  test  the  effeetiveness  of  the  Actor  Oriented  Model  at  identifying 
statistically  significant  social  theories  present  in  the  data.  Finally,  we  found  that  both  the  LPM 
and  the  AOM  fit  into  the  social  theory  framework  of  constructuralism,  which  is  implemented  in 
the  simulation  package  Construct. 


Limitations 

There  are  limitations  on  each  of  the  modeling  techniques  employed.  The  LPM  assumes 
dyadic  independence,  which  is  clearly  not  true  in  some  circumstances  of  network  evolution.  If 
the  network  under  study  is  in  a  dynamic  equilibrium,  however,  we  have  found  that  the  LPM 
performs  well  at  estimating  the  likelihood  of  interactions.  ERGM  and  AOM  also  have  limitations 
in  that  they  assume  a  memoryless  property  inherent  in  all  Markov  graph  models.  As  we  have 
explored  in  the  simulation  chapter,  there  are  also  some  very  specific  assumptions  made  with 
constructuralist  theory: 

la.  Individuals,  when  interacting  with  other  individuals,  can  communicate  information. 

lb.  Individuals,  when  interacting  with  other  individuals,  can  acquire  information. 

lc.  Individuals  can  learn  the  newly  acquired  information  thus  augmenting  their  store  of 
knowledge. 

2a.  Individuals  select  interaction  partners  on  the  basis  of  relative  similarity  and 
availability. 

2b.  Individuals  engage  in  interaction  concurrently  thus  an  individual's  first  choice  of 
interaction  partner  may  not  be  available. 

3a.  Individuals  have  both  an  information  processing  capability  and  knowledge  which 
jointly  determine  the  individual's  behavior. 

3b.  Individuals  have  the  same  information  processing  capabilities. 

3c.  Individuals  differ  in  knowledge  as  each  individual's  knowledge  depends  on  the 
individual's  particular  socio-cultural-historical  background. 

3d.  Individuals  can  be  divided  into  types  or  classes  on  the  basis  of  extant  knowledge 
differences. 

When  these  assumptions  do  not  hold,  there  will  likely  be  error  in  the  results  obtained  from 
utilizing  these  methods.  Unfortunately,  little  is  known  on  the  nature  of  these  biases. 
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Contributions 


Contributions  of  this  paper  include  preliminary  analysis  of  the  effectiveness  of  the 
ERGM,  AOM,  and  LPM  at  modeling  longitudinal  networks,  a  statistical  test  to  compare  the 
effectiveness  between  these  models  and  empirical  data,  and  an  interface  for  taking  model 
parameters  to  teach  a  simulation  how  to  represent  the  real  world  data.  Much  of  this  literature 
until  now  has  existed  in  mutually  exclusive  areas  of  SNA.  This  paper  serves  to  unify  these  areas 
and  provide  a  framework  and  tools  to  bridge  between  them. 

Future  Work 

There  are  many  opportunities  for  future  work.  AOM  must  be  compared  against  LPM  and 
ERGM  with  empirical  datasets.  There  is  much  work  to  be  done  in  implementing  an  actual 
interface  into  Construct  that  could  take  empirical  data  and  apply  the  SNA  models  with  their 
estimation  techniques  directly  into  a  simulation.  In  this  way,  researchers  could  obtain  a  body  of 
data  and  seamlessly  create  accurate  simulations  of  that  data  by  specifying  the  appropriate  multi¬ 
agent  simulation  model. 

SNA  is  a  rapidly  expanding  research  area,  and  collaboration  between  social  theory 
practitioners,  statisticians,  and  modelers  can  capitalize  on  this  expansion.  This  paper  has 
illustrated  how  this  collaboration  can  occur  by  spanning  all  three  areas.  As  richer  empirical  data, 
more  accurate  models,  and  better  estimation  techniques  become  available,  synthesizing  them  into 
unified  suites  of  tools  promises  to  deepen  our  understanding  of  networks  and  provide  researchers 
with  valuable  and  powerful  insight  into  the  social  systems  around  us. 
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